Weijia Wu

Weijia Wu is currently a research fellow at (Show Lab), National University of Singapore, working with Prof. Mike Z. Shou. I received my PhD from Zhejiang University. Additionally, I am also a popper; I have been dancing popping for five years. In my spare time, I also enjoy playing basketball and swimming.

Email  /  Google Scholar  /  GitHub  /  Twitter  /  Linkedin

profile photo
Research Interests

My current research interests primarily focus on AI research for camera-based video production, including Video Generation/editing, controllable video generation, and long video generation.

Notes: Any form of talks and collaboration (job opportunities) is welcomed. Feel free to contact me by Email(weijiawu96@gmail.com).

  • Video Text Detection/Spotting: TransDETR (IJCV 2024), DSText (ICDAR 2023 & PR 2024), BOVText (NeurIPS 2021, Dataset Track)
  • Video Retrieval: TextVR (PR 2023)
  • Synthetic Data for Perception Tasks: DiffuMask (ICCV 2023), DatasetDM (NeurIPS 2023)
  • Image Generation: ParaDiffusion (Arxiv, Nov., 2023)
  • Video Generation: DragAnything (ECCV 2024), Awesome-Video-Diffusion (3.4k stars), MovieBench (Arxiv, Nov., 2024)
  • Unified Multimodal Models: Awesome Unified Multimodal Models (220 stars), Blog--Towards Unified Multimodal Models:Trends and Insights

  • Recent Updates

  • [Sep. 2024]: One paper (ZipCache) got accepted in NeurIPS 2024!
  • [Jul. 2024]: One paper (TextVR) got accepted in Pattern Recognition 2024!
  • [Jul. 2024]: Three papers (DragAnything, MotionDirector (Oral), ControlCap) got accepted in ECCV 2024!
  • [Mar. 2024]: One paper(TransDETR) got accepted in IJCV!
  • [Feb. 2024]: One paper (DiverGen) got accepted in CVPR 2024!
  • [Jan. 2024]: Got my Ph.D. degree from Zhejiang University!
  • [Jan. 2024]: One paper (EfficientDM) got accepted in ICLR 2024 (spotlight)!
  • [Nov. 2023]: One paper (DSText V2) got accepted in Pattern Recognition 2024!
  • [Nov. 2023]: One paper (CisDQ) got accepted in IEEE TCSVT 2023!
  • [Sep 2023]: Three papers (DatasetDM, Mix-of-Show, PTQD) got accepted in NeurIPS 2023!
  • [July 2023]: Three papers (DiffuMask, GenPromp, BiViT) got accepted in ICCV 2023!
  • [Dec. 2022]:We organize LOng-form VidEo Understanding and Generation Workshop & International Challenge @ CVPR'23!
  • [Dec. 2022]:We organize ICDAR2023 Video Text Reading Competition for Dense and Small Text!
  • [June 2022]: One paper got accepted in ICIP 2022!
  • [March 2022]:Serve as a reviewer for ICML2022.
  • [July 2021]: One paper got accepted in NeurIPS 2021!!
  • [June 2021]:Serve as a reviewer for NeurIPS2021.
  • Experience

  • [January 2021 - January 2022]:Research Intern at MMU, KuaiShou, led by Debing Zhang
  • [January 2022 - August 2023]:Research Intern at MMU, KuaiShou, led by Jiahong Li
  • [August 2022 - August 2023]:Visting PhD student at Show lab, NUS, led by Asst Prof. Mike Shou
  • [Dec. 2023 - Present]:Research Fellow at Show lab, NUS, work with Asst Prof. Mike Shou

  • Selected Publications

    2024
    DragAnything: Motion Control for Anything using Entity Representation

    Weijia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang.
    The 18th European Conference on Computer Vision(ECCV 2024),

    Abstract / arXiv / BibTex / GitHub / Project Page / Twitter(X) /

    EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

    Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang.
    The Twelfth International Conference on Learning Representations(ICLR 2024 spotlight),

    Abstract / arXiv / BibTex /

    2023
    PTQD: Accurate Post-Training Quantization for Diffusion Models†

    Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang.
    Thirty-seventh Conference on Neural Information Processing Systems(NeurIPS 2023),

    Abstract / arXiv / BibTex /

    DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

    Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen.
    Thirty-seventh Conference on Neural Information Processing Systems(NeurIPS 2023),

    Abstract / arXiv / BibTex / GitHub / Project Page / 中文公众号报道 /

    BiViT: Extremely Compressed Binary Vision Transformers

    Yefei He,Zhenyu Lou,,Luoming Zhang Weijia Wu, Bohan Zhuang, Hong Zhou.
    International Conference on Computer Vision Conference (ICCV2023),

    Abstract / arXiv / BibTex /

    Generative Prompt Model for Weakly Supervised Object Localization

    Yuzhong Zhao,Qixiang Ye, Weijia Wu, Chunhua Shen, Fan Wan.
    International Conference on Computer Vision Conference (ICCV2023),

    Abstract / arXiv / BibTex / GitHub /

    DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

    Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen.
    International Conference on Computer Vision Conference (ICCV2023),

    Abstract / arXiv / BibTex / GitHub / Project Page /

    2022
    End-to-End Video Text Spotting with Transformer

    Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Ping Luo, Hong Zhou .
    International Journal of Computer Vision (IJCV),

    Abstract / arXiv / IJCV / BibTex / GitHub / Youtube Demo / ZhiHu /

    Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

    Weijia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Size Li, Zhongyuan Wang, Ping Luo, Hong Zhou .
    arxiv (arxiv),

    Abstract / arXiv / BibTex / Code / Youtube Demo /

    Polygon-free: Unconstrained Scene Text Detection with Box Annotations

    Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Ping Luo Hong Zhou .
    International Conference on Information Processing (ICIP2022),

    Abstract / arXiv / BibTex / Code /

    ECLIP: Efficient Contrastive Language-Image Pretraining via Ensemble Confidence Learning and Masked Language Modeling

    Jue Wang, Haofan Wang, Weijia Wu, Jincan Deng Debing Zhang .
    ICML 2022 Pre-training Workshop,2022

    Abstract / arXiv / BibTex /

    2021
    A Bilingual, Open World Video Text Dataset and End-to-end Video Text Spotter with Transformer

    Weijia Wu, Yuanqiang Cai, Debing Zhang, Jiahong Li Hong Zhou .
    NeurIPS 2021 Track on Datasets and Benchmarks (NeurIPS), 2021

    Abstract / arXiv / Homepage / BibTex / GitHub/ Demo / 中文公众号报道 / 感谢其他人的知乎解读

    2020
    Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

    Weijia Wu, Ning Lu, Enze Xie, Hong Zhou .
    Proceedings of the Asian Conference on Computer Vision (ACCV), 2020.

    Abstract / arXiv / BibTex

    Academic Service

    Conference Review

  • International Conference on Machine Learning(ICML), 2022, 2023, 2024.
  • Neural Information Processing Systems(NeurIPS), 2021, 2022, 2023, 2024.
  • Track Datasets and Benchmarks of Neural Information Processing Systems(NeurIPS), 2021, 2022, 2023, 2024.
  • The Association for Computational Linguistics (ACL) 2024.
  • International Conference on Learning Representations(ICLR), 2023, 2024, 2025.
  • The Association for the Advancement of Artificial Intelligence(AAAI), 2025.
  • IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2024.
  • European Conference on Computer Vision (ECCV), 2024.
  • CVPR 2024 Workshop SyntaGen.
  • Journal Review

  • IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
  • IEEE Transactions on Circuits and Systems for Video Technology (TCSVT).
  • International Journal of Computer Vision (IJCV).
  • Transactions on Multimedia Computing Communications and Applications (ACM TOMM).
  • Engineering Applications of Artificial Intelligence (EAAI).
  • Awards

  • [10/2020] National Scholarship of China.
  • Talks

  • [10/2021] MMU, Kuaishou. Video Text Spotting.


  • Template credits : Dr. Jon Barron