Hi, this is Wei Huang(黄炜)’s website! I am currently a Ph.D advised by Prof.Xiaojuan Qi and Prof.Shiming Zhang. I am also co-supervised by Prof.Zhongrui Wang. Previously I obtained my bachelor’s degree in computer science (Jun 2023) from Beihang University where I was advised by Prof.Si Liu and also worked with Prof.Xianglong Liu.

⛵ Now, I am fortunate to be collaborating closely with Dr. Yukang Chen and Dr. Ligeng Zhu on the Efficient-Large-Model, led by Prof. Song Han

I’m currently conducting some research in efficient/tiny deep learning and application, including:

🚀 Efficient AI: The efficiency of the Large Language/Vision-Language Model and Diffusion Model (e.g. model quantization/binarization).

🔥 Brain-mimic Computing: Neuromorphic computing and hardware acceleration (e.g. spiking neural network-SNN).

Edged AI: Edged AI for wearable and digital health.

🔥 News

  • 2025.02:  🎉🎉 One paper for Chain-of-Thought Video Benchmark (VideoEspresso) is accepted by CVPR’25!
  • 2025.01:  🎉🎉 One paper for MoE-LLM compression (MC-MoE: MoE-LLM compression) and two papers (InfoMax: data pruning; From-Layers-to-States: dynamic neural network layer) for data efficiency and dynamic neural networks are accepted by ICLR’25!
  • 2024.12:  🎉🎉 One Technical Report is accepted by Visual Intelligence
  • 2024.12:  🎉🎉 One Review on AI in wearable diabetes management is accepted by Advanced Intelligent Systems
  • 2024.05:  🎉🎉 One paper for snn security on rram is accepted by ICCAD’24!
  • 2024.04:  🎉🎉 BiLLM is accepted by ICML’24!
  • 2024.02:  Release BiLLM: Pushing the Limit of Post-Training Quantization for LLMs, the first post-training quantization work pushing the LLMs to nearly 1-bit. Please check our paper and code!

💬 Invited Talks and Report

  • 2024.05: BiLLM was reported by IEEE Spectrum. Thanks to Matthew for the interview and report. Please see the link.
  • 2024.05: AI-Time online talk on BiLLM. Please see the video.
  • 2024.04: Our emperical study How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study (new version: An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs) was reported by QbitAI (量子位). Please see the link.
  • 2024.03: Our BiLLM: Pushing the Limit of Post-Training Quantization for LLMs was reported by QbitAI (量子位). Please see the link.

📝 Publications

CVPR 2025
sym

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection sym

Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu

  • A novel dataset designed to enhance video reasoning by addressing the limitations of existing datasets in terms of scale and granularity.
  • We proposed a Hybrid LVLMs Collaboration framework achieving cost-effective and accurate video reasoning, outperforming baseline models on the majority of tasks across our proposed benchmark.
  • VideoEspresso sets a new starting point in video reasoning, offering rich annotations that facilitate advanced multimodal understanding.
ICLR 2025
sym

Data Pruning by Information Maximization

Haoru Tan, Sitong Wu, Wei Huang, Shizhen Zhao, Xiaojuan Qi

  • A new coreset algorithm designed to maximize overall information by accounting for each sample’s individual contribution while reducing information overlap, with a simultaneous focus on maintaining diversity and importance.
  • An efficient gradient-based solver enhanced by sparsification techniques and dataset partitioning strategies to make InfoMax scale to large-scale datasets.
  • InfoMax exhibits the best performance and consistently outperforms the state-of-the-art schemes in a series of tasks, including image classification, an vision-language pre-training, large language model supervised fine-tuning experiments.
ICLR 2025
sym

From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics

Qinshuo Liu, Weiqin Zhao, Wei Huang, Yanwen Fang, Lequan Yu, Guodong Li

  • For a deep neural network, we treat the outputs from layers as states of a continuous process and attempt to leverage the SSM to design the aggregation of layers. To our best knowledge, this is the first time such a perspective has been presented.
  • This leads to a proposed lightweight module, the Selective State Space Model Layer Aggregation (S6LA) module, and it conceptualizes a neural network as a selective state space model(S6), hence solving the layer interactions by the long sequence modelling selective mechanism.
  • Compared with other SOTA convolutional and transformer-based layer aggregation models, S6LA demonstrates superior performance in classification, detection, and instance segmentation tasks.
ICLR 2025
sym

MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More sym

Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, Xiaojuan Qi

  • MC-MoE for accurate weight-only quantization (Weight=1.5~2.5bit).
  • MC-MoE for efficient online dynamic pruning (additional compression ratio > 10%)
  • MC-MoE integrates static quantization and dynamic pruning to collaboratively achieve extreme compression for MoE-LLMs with less accuracy loss, ensuring an optimal trade-off between performance and efficiency.
  • For instance, at 2.54 bits, MC-MoE compresses 76.6% of the model, with only a 3.8% average accuracy loss. During dynamic inference, we further reduce activated parameters by 15%, with a performance drop of less than 0.6%.
[paper] [code] [abstract]
Arxiv
sym

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models sym

Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Xianglong Liu, Luca Benini, Michele Magno, Xiaojuan Qi

  • A novel scheme that observes and proves the structure-clustering of salient elements in LLMs weight matrix.
  • The first group-wise mixed-precision quantization framework for LLMs.
  • Serve as a plug-and-play approach to GPTQ/Omniquant/…, improving the inference-friendly method under low-bit quantization.
[paper] [code] [abstract]
ICCAD 2024
sym

SNNGX: Securing Spiking Neural Networks with Genetic XOR Encryption on RRAM-based Neuromorphic Accelerator sym

Kwunhang Wong, Songqi Wang, Wei Huang, Xinyuan Zhang, Yangu He, Karl M.H. Lai, Yuzhong Jiao, Ning Lin, Xiaojuan Qi, Xiaoming Chen, Zhongrui Wang

  • The first IP protection scheme specifically for SNNs, leveraging a genetic algorithm combined with classic XOR encryption to secure the networks against unauthorized access and tampering.
  • A flexible solution for securing SNNs across various applications, especially in critical domains like biomedical applications where model security is paramount..
[paper] [code] [abstract]
Visual Intelligence
sym

An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs sym

Wei Huang, Xingyu Zheng, Xudong Ma, Haotong Qin, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno

  • Explore the performance of LLaMA3 series models under existing post-training quantization and LoRA-finetuning methods.
  • Point out the significant performance loss of MLLMs based on LLaMA3 under low-bit post-training quantization.
  • Highlights the significant performance gap under low bit-width that needs to be bridged in future developments.
[paper] [code] [abstract]
ICML 2024
sym

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs sym

Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi

  • Compress LLM weights to as low as 1.08-1.1 bit and exceeds the performance of previous quantization methods at 2-bit or even 3-bit.
  • Implements high-performance binary LLM in PTQ mode, efficiently achieving 1bit LLM compression without additional training and backpropagation.
[paper] [code] [abstract]
Arxiv
sym

On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks

Wei Huang, Haotong Qin, Yangdong Liu, Jingzhuo Liang, Yulun Zhang, Ying Li, Xianglong Liu

  • Combine IP-core-level chip runtime clock and power awareness with network sensitivity, achieving a better balance of computational efficiency and accuracy on edge devices.
  • Allow target networks to be compressed and deployed with high accuracy on edge chips with limited computational resources and ultra-low power consumption.
  • Efficiently perform online quantization and optimization without additional devices or data access.
[paper] [abstract]

📖 Educations

  • 2023.09 - (now), Ph.D. Student in Department of Electrical Electronic Engineering, The University of HongKong.
  • 2019.09 - 2023.06, B.Eng. in Computer Science, School of Computer Science and Engineering, Beihang University.

🗒️ Academic Services

  • Conference: ICLR, Neurips, ICML, ECCV, AISTATS, ICCV
  • Journal: Neural Networks.
  • Program Committee member for Practical Deep Learning Workshop, IEEE CAI 2024.

🎖 Honors and Awards

  • 2019-2023(B.Eng.): Outstanding Graduate, Beihang University (2023). Outstanding Project of the 16th National College Student Innovation and Entrepreneurship Competition, China (2023). Outstanding Project of the 15th National College Student Innovation and Entrepreneurship Competition, China (2022). Second-class of the Social Practice Scholarship, Beihang University (2022). Third-class of the Innovation and Entreprenuership Scholarship, Beihang University (2021). Second-class of the Subject Competition Scholarship, Beihang University (2022), 3rd Prize of the 32st “Feng Ru Cup” Competition (2022). Second-class scholarship, Beihang University (2022). 3rd “Lan Qiao Cup” programming competation(Python), Beijing (2022). Second-class of the Social Practice Scholarship, Beihang University (2021). Second-class of the Subject Competition Scholarship, Beihang University (2021). Outstanding Teaching Assistant, Beihang University (2021). 2nd Prize of the 31st “Feng Ru Cup” Competition (2020). First-class scholarship, Beihang University (2020).

💻 Internships & Teaching Services

  • 2022.09 - 2023.01, AI algorithm internship on model inference acceleration, Enflame, China.
  • 2022.08 - 2023.01, TA for Frontiers in Artificial Intelligence, Beihang University.
  • 2022.08 - 2023.01, TA for Computer Hardware Basics, the head of TA team, Beihang University.
  • 2021.08 - 2022.01, TA for Computer Hardware Basics, the head of TA team, Beihang University.
  • 2021.03 - 2021.06, TA for Discrete Mathematics, the head of TA team, Beihang University.