Avatar

Sangmin Woo

Ph.D. Candidate in EE @ KAIST


I am currently pursuing a Ph.D. degree in Electrical Engineering at KAIST. In 2021, I completed an M.S. degree in Electrical Engineering and Computer Science at GIST. Prior to that, I obtained a B.S. degree in Electrical Engineering from KNU in 2019.

Humans are inherently multi-modal learners, with vision playing a pivotal role in shaping our understanding of the world. I am passionate about bridging the gap between machine perception and human-level understanding by harnessing the potential of multi-modal learning.

My work explores the following, but not limited to:

  • Multi-modal: Vision + X ∈ {Language, Audio, Depth, etc.} Multi-modal
  • Video/Image Understanding Video Understanding Image Understanding
  • Generation & Diffusion Models Generation

I thrive on creative challenges and enjoy building strong relationships along the way. Explore my academic journey below, and contact me directly to learn more.

Contact

  • smwoo95 [at] kaist.ac.kr

    shmwoo9395 [at] gmail.com

  • 291, Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea 34141

Education

  • Ph.D. in EE, KAIST, 2025 (Expected)

  • M.S. in EECS, GIST, 2021

    on "Learning to Detect Visual Relationships in Images and Videos"

  • B.S. in EE, KNU, 2019

Publications


    2024
  • Generation

    Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

    Byeongjun Park*, Hyojun Go*, Jinyoung Kim, Sangmin Woo, Seokil Ham, Changick Kim

    Arxiv 2024

    [ paper | code | project ]

  • Generation

    HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

    Sangmin Woo*, Byeongjun Park*, Hyojun Go, Jinyoung Kim, Changick Kim (*Equal Contribution)

    CVPR 2024

    [ paper | code | project | demo ]

  • Generation

    Denoising Task Routing for Diffusion Models

    Byeongjun Park*, Sangmin Woo*, Hyojun Go*, Jinyoung Kim*, Changick Kim (*Equal Contribution)

    ICLR 2024

    [ paper | code | project ]

  • Multi-modal Video Understanding

    Sketch-based Video Object Localization

    Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim

    WACV 2024

    [ paper | code ]


  • 2023
  • Multi-modal Video Understanding

    AHFu-Net: Align, Hallucinate, and Fuse Network for Missing Multimodal Action Recognition

    Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

    VCIP 2023 Oral presentation

    [ paper ]

  • Multi-modal Video Understanding

    Multi-modal Social Group Activity Recognition in Panoramic Scene

    Donguk Kim, Sumin Lee, Sangmin Woo, Jinyoung Park, Muhammad Adi Nugroho, Changick Kim

    VCIP 2023

    [ paper ]

  • Multi-modal Video Understanding

    Cross-Modal Alignment and Translation for Missing Modality Action Recognition

    Yeonju Park, Sangmin Woo, Sumin Lee, Muhammad Adi Nugroho, Changick Kim

    CVIU 2023

    [ paper ]

  • Multi-modal Video Understanding

    Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition

    Sumin Lee, Sangmin Woo, Muhammad Adi Nugroho, Changick Kim

    under review in TIP

    [ paper ]

  • Multi-modal Video Understanding

    Audio-Visual Glance Network for Efficient Video Recognition

    Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

    ICCV 2023
    Invited Paper Talk @ CARAI Workshop

    [ paper ]

  • Multi-modal Video Understanding

    Towards Good Practices for Missing Modality Robust Action Recognition

    Sangmin Woo, Sumin Lee, Yeonju Park, Muhammad Adi Nugroho, Changick Kim

    AAAI 2023 Oral presentation

    [ paper | code ]

  • Multi-modal Video Understanding

    Modality Mixer for Multi-modal Action Recognition

    Sumin Lee, Sangmin Woo, Yeonju Park, Muhammad Adi Nugroho, Changick Kim

    WACV 2023

    [ paper ]


  • 2022
  • Multi-modal Video Understanding

    Explore and Match: Bridging Proposal-Based and Proposal-Free with Transformer for Sentence Grounding in Videos

    Sangmin Woo, Jinyoung Park, Inyong Koo, Sumin Lee, Minki Jeong, Changick Kim

    Arxiv 2022
    Finalist, 29th HumanTech Paper Award, Samsung Electronics Co., Ltd

    [ paper | code ]

  • Image Understanding

    Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions

    Sangmin Woo, Junhyug Noh, Kangil Kim

    TNNLS 2022

    [ paper | code ]

  • Image Understanding

    Temporal Flow Mask Attention for Open-Set Long-Tailed Recognition of Wild Animals in Camera-Trap Images

    Jeongsoo Kim, Sangmin Woo, Byeongjun Park, Changick Kim

    ICIP 2022

    [ paper ]

  • General Learning

    Impact of Sentence Representation Matching in Neural Machine Translation

    Heeseung Jung, Kangil Kim, Jong-Hun Shin, Seung-Hoon Na, Sangkeun Jung, Sangmin Woo

    Applied Sciences 2022

    [ paper ]


  • 2021
  • Video Understanding

    What and When to Look?: Temporal Span Proposal Network for Video Visual Relation Detection

    Sangmin Woo, Junhyug Noh, Kangil Kim

    under review in ESWA

    [ paper | code ]

  • General Learning

    Revisiting Dropout: Escaping Pressure for Training Neural Networks with Multiple Costs

    Sangmin Woo, Kangil Kim, Junhyug Noh, Jong-Hun Shin, and Seung-Hoon Na

    Electronics 2021

    [ paper | code ]

Research Experiences

  • Computational Intelligence Lab @ KAIST
    Sep 2021 - Present

    Research Assistant
  • Robot Vision Team @ NAVER LABS
    Apr 2023 - Aug 2023

    Research Intern
  • Intelligence Representation and Reasoning Lab @ GIST
    Sep 2019 - Aug 2021

    Research Assistant

Academic Activities

  • Reviewer at CVPR (2024 ~)
  • Reviewer at ECCV (2024 ~)
  • Reviewer at AAAI (2023 ~)
  • Reviewer at IEEE TNNLS, IEEE TCSVT

Awards & Honors

  • Invited Paper Talk at CARAI Workshop ($ 100). Center for Applied Research in Artificial Intelligence. Oct, 2023
  • Finalist, 29th HumanTech Paper Award, Samsung Electronics Co., Ltd. Dec, 2022
  • Top Award ($ 10,000), LG Electronics Robot Contest, LG Electronics Co., Ltd. Dec, 2021
  • Excellence Award ($ 500), Creative Space G A.I&IoT Makerthon, GIST. Nov, 2019

Patents

  • Method and Appratus for Human Activity Recognition using Accelerometer and Gyroscope Sensors (KR Patent Application: 10-2022-0094911)
  • Method and Device for Inferring Dynamic Relationship between Objects in Video (KR Patent Application: 10-2021-0125704)
  • Scene Graph Generation Apparatus (KR Patent 10-2254-7680000)