Wassim Swaileh, PhD

AI Engineer -- Robotics, Autonomous Systems & Multimodal AI

swaileh@hotmail.com +358 40 185 3001 Helsinki, Finland

About Me

AI Engineer with 8+ years post-PhD experience building production AI models and training pipelines for multimodal and autonomous systems. Expertise in VLMs, vision-language-action models, 3D scene reconstruction, and large-scale ML infrastructure.

Currently working at Huawei Suomi Finland Research Center as a Senior Researcher in Multimodal AI, where I:

Build and ship production-scale video generation models using VLMs and diffusion architectures
Design end-to-end training pipelines with containerized environments on ModelArts MLOps platform
Co-developed ReWind (CVPR 2025), a large language model for long video understanding
Spearhead OCR capabilities, deploying multimodal perception models across multiple languages

My research spans:

Video Understanding & Generation
Vision-Language Models (LLaVA, CLIP, QwenVL)
Vision-Language-Action (VLA) Models
Generative Diffusion Models

4D LiDAR Generation & Autonomous Systems
3D Scene Reconstruction
Kinematic Time Series Analysis
Document AI & OCR

Core Expertise

Video AI

Video understanding, generation, and long video analysis with memory models

Multimodal AI

VLMs, VLA models, and multimodal perception systems

Autonomous Systems

4D LiDAR generation, sensor fusion, and 3D reconstruction

Career Highlights

2023 - Present

Senior Researcher - Multimodal AI

Huawei Suomi Finland Research Center, Helsinki

Building production-scale video generation models and co-developing ReWind (CVPR 2025) for long video understanding.

2021 - 2023

R&D Engineer - Robotics & Sensor AI

IRISA Lab UMR 6074, Rennes, France

Deep learning pipelines for kinematic time series analysis and trajectory reconstruction using TCNs.

2019 - 2021

Lecturer & Researcher - Deep Learning & 3D AI

ETIS Lab UMR 8051, Cergy Paris University

UNet-based architectures for visual perception and 3D modeling of ancient maps.

2018 - 2019

R&D Engineer - Document AI

LITIS Lab EA 4108, Rouen, France

Production DNN pipelines for named entity extraction from historical documents (EURHISFIRM project).

Key Achievements

CVPR 2025 Publication

Tier-1 Conference

Co-developed ReWind, a large language model for long video understanding with instructed learnable memory.

Patent Submitted

AI-Journalist

Apparatus to Generate Media Contents Conditioned to User Preferences Settings.

Get In Touch

Interested in collaborating or learning more about my work?

Contact Me

Professional Summary