My research focuses on advancing multimodal artificial intelligence systems that can understand, generate, and interact with complex real-world data across video, language, and 3D spaces.
I am particularly interested in developing AI systems for video understanding and video generation, bridging the gap between visual perception and language understanding through state-of-the-art Vision-Language Models (VLMs).
Developing AI systems for long video analysis with instructed learnable memory. Our ReWind model (CVPR 2025) enables comprehensive understanding of extended video content through novel memory mechanisms.
Long Video Analysis Memory Models Temporal ReasoningBuilding production-scale video generation models using Vision-Language Models and diffusion architectures. Focus on controllable generation and high-quality synthetic video production.
Diffusion Models VLMs Generative AIExpertise in state-of-the-art VLMs including LLaVA, CLIP, QwenVL, and LayoutVLM. Developing multimodal perception systems that bridge vision and language understanding.
LLaVA CLIP QwenVL Multimodal AIResearch on OmniLiDAR: Controllable and Multi-Sensor 4D LiDAR Generation (submitted to ECCV 2026). Focus on 3D scene reconstruction, sensor fusion, and autonomous perception systems.
4D LiDAR Sensor Fusion 3D ReconstructionDeveloping models that combine visual perception, language understanding, and action prediction for robotics and autonomous systems. Enabling AI systems to understand and execute complex instructions.
Robotics Action Prediction Embodied AIResearching state-of-the-art diffusion models for creating and manipulating visual content. Applications in image synthesis, video generation, and creative AI tools.
Diffusion Image Synthesis Generative AIDeep learning for motion sensor data analysis and trajectory reconstruction using Temporal Convolutional Networks (TCNs). Applications in pen trajectory reconstruction and motion analysis.
TCN Time Series Sensor FusionMultimodal perception models for document analysis, handwriting recognition, and historical document processing. Deploying OCR capabilities across multiple languages.
OCR Handwriting Document Analysis2023 - Present
Leading cutting-edge research on multimodal AI systems with focus on:
Discover the research outputs and contributions in these areas
Conference Papers Journal Papers