Avatar

Sangmin Woo

Ph.D. Candidate in EE @ KAIST


I am currently pursuing a Ph.D. degree in Electrical Engineering at KAIST.

Humans are inherently multi-modal learners, with vision playing a pivotal role in shaping our understanding of the world. I am passionate about bridging the gap between machine perception and human-level understanding by harnessing the potential of multi-modal learning.

My work explores the following, but not limited to:

  • Multi-modal Learning Multi-modal
    • (High-level) Vision + X ∈ {Language, Audio, Sketch, etc.}
    • (Low-level) RGB + X {Depth, IR, Flow, etc.}
  • Video/Image Understanding Video Image
  • Generation & Diffusion Models Generation

Contact

  • smwoo95 [at] kaist.ac.kr

    shmwoo9395 [at] gmail.com

  • 291, Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea 34141

Education

  • Ph.D. in EE, KAIST, 2025 (Expected)

  • M.S. in EECS, GIST, 2021

    on "Learning to Detect Visual Relationships in Images and Videos"

  • B.S. in EE, KNU, 2019

News

News I am excited to keep collaborating with the team remotely!
News I had a fantastic summer internship with Amazon AWS AI!
News 3 papers got accepted to ECCV 2024!
News I joined Amazon AWS AI as a summer intern!

Research Experiences

  • Amazon AWS AI (Remote)
    Sep 2024 - Mar 2025

    Research Intern
  • Amazon AWS AI (Santa Clara, CA, US)
    Jun 2024 - Sep 2024

    Research Intern
  • NAVER LABS (Suwon, Korea)
    Apr 2023 - Aug 2023

    Research Intern

Publications


    2024
  • Multi-modal

    RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs

    Sangmin Woo*, Jaehyuk Jang*, Donguk Kim*, Changick Kim (*Equal Contribution)

    Arxiv 2024

    [ paper | code | project ]

  • Multi-modal

    Don’t Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models

    Sangmin Woo*, Donguk Kim*, Jaehyuk Jang*, Changick Kim (*Equal Contribution)

    Arxiv 2024

    [ paper | code | project ]

  • Generation

    Diffusion Model Patching via Mixture-of-Prompts

    Seokil Ham*, Sangmin Woo*, Jinyoung Kim, Hyojun Go, Byeongjun Park, Changick Kim (*Equal Contribution)

    Arxiv 2024

    [ paper | code | project ]

  • Multi-modal Video

    Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

    Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick Kim

    ECCV 2024

    [ paper ]

  • Video

    Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

    Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim

    ECCV 2024

    [ paper ]

  • Generation

    Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

    Byeongjun Park, Hyojun Go, Jinyoung Kim, Sangmin Woo, Seokil Ham, Changick Kim

    ECCV 2024

    [ paper | code | project ]

  • Generation

    HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

    Sangmin Woo*, Byeongjun Park*, Hyojun Go, Jinyoung Kim, Changick Kim (*Equal Contribution)

    CVPR 2024
    Featured by HuggingFace Daily Papers

    [ paper | code | project | demo ]

  • Generation

    Denoising Task Routing for Diffusion Models

    Byeongjun Park*, Sangmin Woo*, Hyojun Go*, Jinyoung Kim*, Changick Kim (*Equal Contribution)

    ICLR 2024

    [ paper | code | project ]

  • Multi-modal Video

    Sketch-based Video Object Localization

    Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim

    WACV 2024

    [ paper | code ]


  • 2023
  • Multi-modal Video

    AHFu-Net: Align, Hallucinate, and Fuse Network for Missing Multimodal Action Recognition

    Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

    VCIP 2023 Oral presentation

    [ paper ]

  • Multi-modal Video

    Multi-modal Social Group Activity Recognition in Panoramic Scene

    Donguk Kim, Sumin Lee, Sangmin Woo, Jinyoung Park, Muhammad Adi Nugroho, Changick Kim

    VCIP 2023

    [ paper ]

  • Multi-modal Video

    Cross-Modal Alignment and Translation for Missing Modality Action Recognition

    Yeonju Park, Sangmin Woo, Sumin Lee, Muhammad Adi Nugroho, Changick Kim

    CVIU 2023

    [ paper ]

  • Multi-modal Video

    Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition

    Sumin Lee, Sangmin Woo, Muhammad Adi Nugroho, Changick Kim

    under review in TIP

    [ paper ]

  • Multi-modal Video

    Audio-Visual Glance Network for Efficient Video Recognition

    Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

    ICCV 2023
    Invited Paper Talk @ CARAI Workshop

    [ paper ]

  • Multi-modal Video

    Towards Good Practices for Missing Modality Robust Action Recognition

    Sangmin Woo, Sumin Lee, Yeonju Park, Muhammad Adi Nugroho, Changick Kim

    AAAI 2023 Oral presentation

    [ paper | code ]

  • Multi-modal Video

    Modality Mixer for Multi-modal Action Recognition

    Sumin Lee, Sangmin Woo, Yeonju Park, Muhammad Adi Nugroho, Changick Kim

    WACV 2023

    [ paper ]


  • 2022
  • Multi-modal Video

    Explore and Match: Bridging Proposal-Based and Proposal-Free with Transformer for Sentence Grounding in Videos

    Sangmin Woo, Jinyoung Park, Inyong Koo, Sumin Lee, Minki Jeong, Changick Kim

    Arxiv 2022
    Finalist, 29th HumanTech Paper Award, Samsung Electronics Co., Ltd

    [ paper | code ]

  • Multi-modalImage

    Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions

    Sangmin Woo, Junhyug Noh, Kangil Kim

    TNNLS 2022

    [ paper | code ]

  • Image

    Temporal Flow Mask Attention for Open-Set Long-Tailed Recognition of Wild Animals in Camera-Trap Images

    Jeongsoo Kim, Sangmin Woo, Byeongjun Park, Changick Kim

    ICIP 2022

    [ paper ]

  • General Learning

    Impact of Sentence Representation Matching in Neural Machine Translation

    Heeseung Jung, Kangil Kim, Jong-Hun Shin, Seung-Hoon Na, Sangkeun Jung, Sangmin Woo

    Applied Sciences 2022

    [ paper ]


  • 2021
  • Multi-modalVideo

    What and When to Look?: Temporal Span Proposal Network for Video Visual Relation Detection

    Sangmin Woo, Junhyug Noh, Kangil Kim

    under review in ESWA

    [ paper | code ]

  • General Learning

    Revisiting Dropout: Escaping Pressure for Training Neural Networks with Multiple Costs

    Sangmin Woo, Kangil Kim, Junhyug Noh, Jong-Hun Shin, and Seung-Hoon Na

    Electronics 2021

    [ paper | code ]

Academic Activities

  • Reviewer at CVPR (2024 ~), ECCV (2024 ~)
  • Reviewer at NeurIPS (2024 ~), ICLR (2024 ~), AAAI (2023 ~). AISTATS (2025 ~)
  • Reviewer at IEEE TNNLS, IEEE TCSVT

Awards & Honors

  • Invited Paper Talk at CARAI Workshop. Center for Applied Research in Artificial Intelligence. Oct, 2023
  • Finalist, 29th HumanTech Paper Award, Samsung Electronics Co., Ltd. Dec, 2022
  • Top Award ($ 10,000), LG Electronics Robot Contest, LG Electronics Co., Ltd. Dec, 2021
  • Excellence Award ($ 500), Creative Space G A.I&IoT Makerthon, GIST. Nov, 2019

Patents

  • Method and Appratus for Human Activity Recognition using Accelerometer and Gyroscope Sensors (KR Patent Application: 10-2022-0094911)
  • Method and Device for Inferring Dynamic Relationship between Objects in Video (KR Patent Application: 10-2021-0125704)
  • Scene Graph Generation Apparatus (KR Patent 10-2254-7680000)