Hi! This is Hitesh, welcome to my space! I am currently working as an AI Research Engineer at AMD, working on a breadth of problems from diffusion models, LLMs, VLMs and VLAs focusing on efficiency and performance. Previously, I worked as a Research Fellow at Microsoft Research India and was advised by Jianwei Yang at Microsoft Research Redmond to investigate the capabilities of diffusion models in image editing and video/GIF generation.

I'm deeply fascinated by the synergy between different types of data including text, images, videos, audio, and more, mirroring a subset of human senses. Eventually, I want to work at this big intersection. As a stepping stone towards this goal, I am currently studying the interaction between vision and language and robot actions.

News & Updates

Feb 2026
DUET-VLM accepted at CVPR 2026

Our paper on dual-stage efficient token reduction for VLMs has been accepted at IEEE/CVF CVPR 2026.

Paper
Feb 2026
Workshop on Fine-Tuning AI Models with Unsloth at IIT Delhi

Conducted a hands-on workshop on fine-tuning AI models using Unsloth at IIT Delhi.

Workshop
Jun 2025
Workshop on Fine-Tuning AI Models with Meta at IIT Bombay

Conducted a workshop on fine-tuning AI models with Meta at IIT Bombay.

Workshop
Jun 2025
Workshop on Fine-Tuning AI Models with Meta at IISc Bangalore

Conducted a workshop on fine-tuning AI models with Meta at IISc Bangalore.

Workshop
Jul 2024
Joined AMD as AI Research Engineer

Joined AMD to work on efficiency and performance of diffusion models, LLMs, VLMs and VLAs.

Milestone
Jul 2024
Pix2Gif accepted at ECCV 2024

Our paper on motion-guided diffusion for GIF generation has been accepted at the European Conference on Computer Vision (ECCV) 2024.

Paper
Apr 2024
Pix2Gif accepted at CVPR 2024 AI4CC Workshop

Pix2Gif was accepted at the AI for Content Creation (AI4CC) Workshop at CVPR 2024, Seattle.

Paper

Publications

DUET-VLM pipeline
DUET-VLM: Dual-Stage Unified Efficient Token Reduction for VLM Training and Inference
Aditya Kumar Singh*, Hitesh Kandala*, Pratik Prabhanjan Brahma, Zicheng Liu, Emad Barsoum (* Equal Contribution)
CVPR 2026 | IEEE/CVF Conference on Computer Vision and Pattern Recognition
pdf| abstract| code| cite

model figure
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Hitesh Kandala, Jianfeng Gao, Jianwei Yang
ECCV 2024 | European Conference on Computer Vision
pdf| abstract| project| code| proceedings| cite

teaser figure
Beyond Boundaries: A Novel Data-Augmentation Discourse for Open Domain Generalization
Shirsha Bose, Ankit Jha, Hitesh Kandala, Biplab Banerjee TMLR | Transactions on Machine Learning Research
paper| cite

model figure
Exploring Transformer and Multi Label Classification for Remote Sensing Image Captioning
Hitesh Kandala, Sudipan Saha, Biplab Banerjee, Xiao Xiang Zhu IEEE GRSL | IEEE Geoscience and Remote Sensing Letters
paper| cite

results figure
Multi-Stage Semantic Graph Embeddings for Compositional Zero-Shot Learning
Hitesh Kandala, Ruchika Chavhan, Ushasi Chaudhuri, Biplab Banerjee
paper| cite

IIT Bombay
2018 - 2022
Microsoft Research
2022 - 2024
AMD
2024 - Present