Hitesh Kandala

curriculum vitae

google scholar

email address

github profile

news & updates

publications

college projects

Hi! This is Hitesh, welcome to my space! I am currently working as an AI Research Engineer at AMD, working on a breadth of problems from diffusion models, LLMs, VLMs and VLAs focusing on efficiency and performance. Previously, I worked as a Research Fellow at Microsoft Research India and was advised by Jianwei Yang at Microsoft Research Redmond to investigate the capabilities of diffusion models in image editing and video/GIF generation.

I'm deeply fascinated by the synergy between different types of data including text, images, videos, audio, and more, mirroring a subset of human senses. Eventually, I want to work at this big intersection. As a stepping stone towards this goal, I am currently studying the interaction between vision and language and robot actions.

Feb 2026

DUET-VLM accepted at CVPR 2026

Our paper on dual-stage efficient token reduction for VLMs has been accepted at IEEE/CVF CVPR 2026.

Paper

Feb 2026

Workshop on Fine-Tuning AI Models with Unsloth at IIT Delhi

Conducted a hands-on workshop on fine-tuning AI models using Unsloth at IIT Delhi.

Workshop

Jun 2025

Workshop on Fine-Tuning AI Models with Meta at IIT Bombay

Conducted a workshop on fine-tuning AI models with Meta at IIT Bombay.

Workshop

Jun 2025

Workshop on Fine-Tuning AI Models with Meta at IISc Bangalore

Conducted a workshop on fine-tuning AI models with Meta at IISc Bangalore.

Workshop

Jul 2024

Joined AMD as AI Research Engineer

Joined AMD to work on efficiency and performance of diffusion models, LLMs, VLMs and VLAs.

Milestone

Jul 2024

Pix2Gif accepted at ECCV 2024

Our paper on motion-guided diffusion for GIF generation has been accepted at the European Conference on Computer Vision (ECCV) 2024.

Paper

Apr 2024

Pix2Gif accepted at CVPR 2024 AI4CC Workshop

Pix2Gif was accepted at the AI for Content Creation (AI4CC) Workshop at CVPR 2024, Seattle.

Paper

DUET-VLM pipeline
DUET-VLM: Dual-Stage Unified Efficient Token Reduction for VLM Training and Inference
Aditya Kumar Singh*, Hitesh Kandala*, Pratik Prabhanjan Brahma, Zicheng Liu, Emad Barsoum (* Equal Contribution)
CVPR 2026 | IEEE/CVF Conference on Computer Vision and Pattern Recognition
pdf| abstract| code| cite

teaser figure
Beyond Boundaries: A Novel Data-Augmentation Discourse for Open Domain Generalization
Shirsha Bose, Ankit Jha, Hitesh Kandala, Biplab Banerjee TMLR | Transactions on Machine Learning Research
paper| cite

Exploring Transformer and Multi Label Classification for Remote Sensing Image Captioning
Hitesh Kandala, Sudipan Saha, Biplab Banerjee, Xiao Xiang Zhu IEEE GRSL | IEEE Geoscience and Remote Sensing Letters
paper| cite

results figure
Multi-Stage Semantic Graph Embeddings for Compositional Zero-Shot Learning
Hitesh Kandala, Ruchika Chavhan, Ushasi Chaudhuri, Biplab Banerjee
paper| cite

IIT Bombay

2018 - 2022

Microsoft Research

2022 - 2024

AMD

2024 - Present