Sangmin Woo

Applied Scientist @ Amazon

I am an Applied Scientist at Amazon Agentic AI. I received my Ph.D. degree at KAIST.

Humans are inherently multi-modal learners, naturally understanding the world by looking (vision), listening (audio), and communicating (language). I am passionate about advancing machine intelligence to mirror this ability, enabling systems to understand the world holistically and generate faithful, human-centered content.

My work explores the following, but not limited to:

Multi-modal AI: Vision + {Language, Audio, etc.} Multi-modal
Generative AI (Vision Language Models, Large Language Models, Diffusion Models) Gen AI
Visual Understanding Video Image

Contact

sangminw [at] amazon.com

shmwoo9395 [at] gmail.com
2795 Augustine Dr, Santa Clara, CA 95054, United States

Education

Ph.D. @ KAIST, 2025

on "Deep Visual and Multimodal Generation: Advancing Diffusion Models and Large Vision Language Models"
M.S. @ GIST, 2021

on "Learning to Detect Visual Relationships in Images and Videos"
B.S. @ KNU, 2019

News

25.08 1 paper accepted to ESWA!
25.08 1 paper accepted to EMNLP 2025 Main!
25.06 I am starting my new chapter at Amazon! 🧑🏻‍💻
25.06 Selected as an Outstanding Reviewer at CVPR 2025!
25.05 1 paper accepted to ACL 2025 Findings!
25.05 Successfully defended my PhD! 🎓
25.04 1 paper accepted to CVIU!
25.02 1 paper accepted to CVPR 2025!
25.01 1 paper accepted to NAACL 2025 Main!
24.12 1 paper accepted to AAAI 2025!
24.09 Excited to keep collaborating with the team remotely!
24.09 I had a fantastic summer internship with Amazon!
24.07 3 papers accepted to ECCV 2024!
24.06 I joined Amazon Bedrock as a summer intern!

Experiences

Amazon Agentic AI (Santa Clara, CA, US)

June 2025 - Present

Applied Scientist
Amazon Bedrock (Remote)

Sep 2024 - Mar 2025

Applied Scientist Intern
Amazon Bedrock (Santa Clara, CA, US)

Jun 2024 - Sep 2024

Applied Scientist Intern
NAVER LABS (Suwon, Korea)

Apr 2023 - Aug 2023

Research Intern

Publications

2025

Multi-modalVideo

What and When to Look?: Temporal Span Proposal Network for Video Visual Relation Detection

Sangmin Woo, Junhyug Noh, Kangil Kim

ESWA 2025 (IF=8.6)

[ paper | code ]
Gen AI

A Systematic Survey of Automatic Prompt Optimization Techniques

Kiran Ramnath, Kang Zhou, Sheng Guan, Soumya Smruti Mishra, Xuan Qi, Zhengyuan Shen, Shuai Wang, Sangmin Woo, Sullam Jeoung,
Yawei Wang, Haozhu Wang, Han Ding, Yuzhe Lu, Zhichao Xu, Yun Zhou, Balasubramaniam Srinivasan, Qiaojing Yan, Yueyan Chen,
Haibo Ding, Panpan Xu, Lin Lee Cheong

EMNLP 2025 Main

[ paper ]
Gen AIMulti-modal

Don’t Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models

Sangmin Woo*, Donguk Kim*, Jaehyuk Jang*, Yubin Choi, Changick Kim (*Equal Contribution)

ACL 2025 Findings

[ paper | code | project ]
Multi-modal Video

Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition

Sumin Lee, Sangmin Woo, Muhammad Adi Nugroho, Changick Kim

CVIU 2025 (IF=4.8)

[ paper ]
Learning

Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation

Seokil Ham, Hee-Seon Kim, Sangmin Woo, Changick Kim

CVPR 2025

[ paper | code ]
Gen AIMulti-modal

Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models

Sangmin Woo, Kang Zhou, Yun Zhou, Shuai Wang, Sheng Guan, Haibo Ding, Lin Lee Cheong

NAACL 2025

[ paper ]
Gen AI

Diffusion Model Patching via Mixture-of-Prompts

Seokil Ham*, Sangmin Woo*, Jinyoung Kim, Hyojun Go, Byeongjun Park, Changick Kim (*Equal Contribution)

AAAI 2025

[ paper | code | project ]

2024

Gen AIMulti-modal

RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models

Sangmin Woo*, Jaehyuk Jang*, Donguk Kim*, Yubin Choi, Changick Kim (*Equal Contribution)

Arxiv 2024

[ paper | code | project ]
Multi-modal Video

Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick Kim

ECCV 2024

[ paper ]
Video

Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim

ECCV 2024

[ paper ]
Gen AI

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

Byeongjun Park, Hyojun Go, Jinyoung Kim, Sangmin Woo, Seokil Ham, Changick Kim

ECCV 2024

[ paper | code | project ]
Gen AI

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

Sangmin Woo*, Byeongjun Park*, Hyojun Go, Jinyoung Kim, Changick Kim (*Equal Contribution)

CVPR 2024
Featured by HuggingFace Daily Papers
Finalist, Qualcomm Innovation Fellowship 2024 Korea

[ paper | code | project | demo ]
Gen AI

Denoising Task Routing for Diffusion Models

Byeongjun Park*, Sangmin Woo*, Hyojun Go*, Jinyoung Kim*, Changick Kim (*Equal Contribution)

ICLR 2024

[ paper | code | project ]
Multi-modal Video

Sketch-based Video Object Localization

Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim

WACV 2024

[ paper | code ]

2023

Multi-modal Video

AHFu-Net: Align, Hallucinate, and Fuse Network for Missing Multimodal Action Recognition

Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

VCIP 2023 Oral presentation

[ paper ]
Multi-modal Video

Multi-modal Social Group Activity Recognition in Panoramic Scene

Donguk Kim, Sumin Lee, Sangmin Woo, Jinyoung Park, Muhammad Adi Nugroho, Changick Kim

VCIP 2023

[ paper ]
Multi-modal Video

Cross-Modal Alignment and Translation for Missing Modality Action Recognition

Yeonju Park, Sangmin Woo, Sumin Lee, Muhammad Adi Nugroho, Changick Kim

CVIU 2023 (IF=4.8)

[ paper ]
Multi-modal Video

Audio-Visual Glance Network for Efficient Video Recognition

Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

ICCV 2023
Invited Paper Talk @ CARAI Workshop

[ paper ]
Multi-modal Video

Towards Good Practices for Missing Modality Robust Action Recognition

Sangmin Woo, Sumin Lee, Yeonju Park, Muhammad Adi Nugroho, Changick Kim

AAAI 2023 Oral presentation

[ paper | code ]
Multi-modal Video

Modality Mixer for Multi-modal Action Recognition

Sumin Lee, Sangmin Woo, Yeonju Park, Muhammad Adi Nugroho, Changick Kim

WACV 2023

[ paper ]

~ 2022

Multi-modal Video

Explore and Match: Bridging Proposal-Based and Proposal-Free with Transformer for Sentence Grounding in Videos

Sangmin Woo, Jinyoung Park, Inyong Koo, Sumin Lee, Minki Jeong, Changick Kim

Arxiv 2022
Finalist, 29th HumanTech Paper Award, Samsung Electronics Co., Ltd

[ paper | code ]
Multi-modalImage

Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions

Sangmin Woo, Junhyug Noh, Kangil Kim

TNNLS 2022 (IF=14.2)

[ paper | code ]
Image

Temporal Flow Mask Attention for Open-Set Long-Tailed Recognition of Wild Animals in Camera-Trap Images

Jeongsoo Kim, Sangmin Woo, Byeongjun Park, Changick Kim

ICIP 2022

[ paper ]
Learning

Impact of Sentence Representation Matching in Neural Machine Translation

Heeseung Jung, Kangil Kim, Jong-Hun Shin, Seung-Hoon Na, Sangkeun Jung, Sangmin Woo

Applied Sciences 2022 (IF=2.8)

[ paper ]
Learning

Revisiting Dropout: Escaping Pressure for Training Neural Networks with Multiple Costs

Sangmin Woo, Kangil Kim, Junhyug Noh, Jong-Hun Shin, and Seung-Hoon Na

Electronics 2021 (IF=2.6)

[ paper | code ]

Academic Activities

Reviewer at CVPR (2024 ~), ICCV (2025~), ECCV (2024 ~)
Reviewer at ICLR (2024 ~), NeurIPS (2024 ~), ICML (2025 ~), AAAI (2023 ~)
Reviewer at IEEE TNNLS (2023 ~), TMLR (2025 ~), TIP (2025 ~)

Awards & Honors

Outstanding Reviewer (711/12,593), IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun, 2025
Fianlist ($ 1,000), Qualcomm Innovation Fellowship 2024 Korea. Dec, 2024
Invited Paper Talk at CARAI Workshop. Center for Applied Research in Artificial Intelligence. Oct, 2023
Finalist, 29th HumanTech Paper Award, Samsung Electronics Co., Ltd. Dec, 2022
Top Award ($ 10,000), LG Electronics Robot Contest, LG Electronics Co., Ltd. Dec, 2021
Excellence Award ($ 500), Creative Space G A.I&IoT Makerthon, GIST. Nov, 2019

Patents

Automated Visual Prompt Routing for Visual Prompt Engineering Without Access to Model Internals (US Patent)
Group Activity Recognition Apparatus and Method using RGB Videos and LiDAR Data (KR Patent: 10-2741168)
Method and Device for Inferring Dynamic Relationship between Objects in Video (KR Patent: 10-2590916)
Human Activity Recognition Apparatus, Human Activity Recognition Method and Computer Program for Classifying Activity Categories of User (KR Patent: 10-2515065)
Scene Graph Generation Apparatus (KR Patent 10-2254768)