I am an Applied Scientist at Amazon Agentic AI. I received my Ph.D. degree at KAIST.
Humans are inherently multi-modal learners, naturally understanding the world by looking (vision), listening (audio), and communicating (language). I am passionate about advancing machine intelligence to mirror this ability, enabling systems to understand the world holistically and generate faithful, human-centered content.
My work explores the following, but not limited to:
sangminw [at] amazon.com
shmwoo9395 [at] gmail.com
2795 Augustine Dr, Santa Clara, CA 95054, United States
Ph.D. @ KAIST, 2025
on "Deep Visual and Multimodal Generation: Advancing Diffusion Models and Large Vision Language Models"
M.S. @ GIST, 2021
on "Learning to Detect Visual Relationships in Images and Videos"
B.S. @ KNU, 2019
25.06 I am starting my new chapter at Amazon! π§π»βπ»
25.06 Selected as an Outstanding Reviewer at CVPR 2025!
25.05 1 paper accepted to ACL 2025 Findings!
25.05 Successfully defended my PhD! π
25.04 1 paper accepted to CVIU!
25.02 1 paper accepted to CVPR 2025!
25.01 1 paper accepted to NAACL 2025 Main!
24.12 1 paper accepted to AAAI 2025!
24.09 Excited to keep collaborating with the team remotely!
24.09 I had a fantastic summer internship with Amazon!
24.07 3 papers accepted to ECCV 2024!
24.06 I joined
Amazon Bedrock as a summer intern!
CVIU 2025
[ paper ]
Arxiv 2025
[ paper ]
NAACL 2025
[ paper ]
ECCV 2024
[ paper ]
ECCV 2024
[ paper ]
CVPR 2024
Featured by HuggingFace Daily Papers
Finalist, Qualcomm Innovation Fellowship 2024 Korea
VCIP 2023 Oral presentation
[ paper ]
VCIP 2023
[ paper ]
CVIU 2023
[ paper ]
ICCV 2023
Invited Paper Talk @ CARAI Workshop
[ paper ]
WACV 2023
[ paper ]
ICIP 2022
[ paper ]
Applied Sciences 2022
[ paper ]