Hi, this is Wei Huang(黄炜)’s website! I am currently a Ph.D student in Computer Vision and Machine Intelligence Lab (CVMI Lab) @ HKU and Wearable, Intelligent and Soft Electronics Lab (WISE Lab) from September 2023, advised by Prof.Xiaojuan Qi and Prof.Shiming Zhang. I am also co-supervised by Prof.Zhongrui Wang. Previously I obtained my bachelor’s degree in computer science (Jun 2023) from Beihang University where I was advised by Prof.Si Liu and also worked with Prof.Xianglong Liu.

I’m currently conducting some research in efficient/tiny deep learning and application, including:

🚀 Efficient AI: The efficiency of the Large Language/Vision-Language Model and Diffusion Model (e.g. model quantization/binarization).

🔥 Brain-mimic Computing: Neuromorphic computing and hardware acceleration (e.g. spiking neural network-SNN).

Edged AI: Edged AI for wearable and digital health.

I‘m actively seeking internship and visiting opportunities. If you have any opportunities available, I would greatly appreciate it if you could reach out to me. Thank you!

🔥 News

  • 2024.10:  Release MC-MoE, a mixture compressor for MoE LLMs combined the static quantization and dynamic pruning. Please check our paper, and code!
  • 2024.05:  🎉🎉 one co-author paper is accepted by ICCAD’24!
  • 2024.05:  Release SliM-LLM, a plug-and-play group-wise mixed-precision quantizaion framework for 2-bit LLMs. Please check our paper, code and huggingface!
  • 2024.04:  Release An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs, an emperical study on the performance of low-bit quantized LLM/MLLM based on LLaMA-3. Please check our paper, code and huggingface!
  • 2024.04:  🎉🎉 BiLLM is accepted by ICML’24!
  • 2024.02:  Release BiLLM: Pushing the Limit of Post-Training Quantization for LLMs, the first post-training quantization work pushing the LLMs to nearly 1-bit. Please check our paper and code!
  • 2023.09:  Release OHQ, the on-chip hardware-aware mixed-precision quantization work. Please check our paper!
  • 2022.10:  Release VLSNR, the multi-modal news recommendation system work. Please check our paper and code!

💬 Invited Talks and Report

  • 2024.05: BiLLM was reported by IEEE Spectrum. Thanks to Matthew for the interview and report. Please see the link.
  • 2024.05: AI-Time online talk on BiLLM. Please see the video.
  • 2024.04: Our emperical study How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study (new version: An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs) was reported by QbitAI (量子位). Please see the link.
  • 2024.03: Our BiLLM: Pushing the Limit of Post-Training Quantization for LLMs was reported by QbitAI (量子位). Please see the link.

📝 Publications

Arxiv
sym

MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More sym

Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, Xiaojuan Qi

  • MC-MoE for accurate weight-only quantization (Weight=1.5~2.5bit).
  • MC-MoE for efficient online dynamic pruning (additional compression ratio > 10%)
  • MC-MoE integrates static quantization and dynamic pruning to collaboratively achieve extreme compression for MoE-LLMs with less accuracy loss, ensuring an optimal trade-off between performance and efficiency.
  • For instance, at 2.54 bits, MC-MoE compresses 76.6% of the model, with only a 3.8% average accuracy loss. During dynamic inference, we further reduce activated parameters by 15%, with a performance drop of less than 0.6%.
[paper] [code] [abstract]
Arxiv
sym

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models sym

Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Xianglong Liu, Luca Benini, Michele Magno, Xiaojuan Qi

  • A novel scheme that observes and proves the structure-clustering of salient elements in LLMs weight matrix.
  • The first group-wise mixed-precision quantization framework for LLMs.
  • Serve as a plug-and-play approach to GPTQ/Omniquant/…, improving the inference-friendly method under low-bit quantization.
[paper] [code] [abstract]
ICCAD 2024
sym

SNNGX: Securing Spiking Neural Networks with Genetic XOR Encryption on RRAM-based Neuromorphic Accelerator sym

Kwunhang Wong, Songqi Wang, Wei Huang, Xinyuan Zhang, Yangu He, Karl M.H. Lai, Yuzhong Jiao, Ning Lin, Xiaojuan Qi, Xiaoming Chen, Zhongrui Wang

  • The first IP protection scheme specifically for SNNs, leveraging a genetic algorithm combined with classic XOR encryption to secure the networks against unauthorized access and tampering.
  • A flexible solution for securing SNNs across various applications, especially in critical domains like biomedical applications where model security is paramount..
[paper] [code] [abstract]
Arxiv
sym

An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs sym

Wei Huang, Xingyu Zheng, Xudong Ma, Haotong Qin, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno

  • Explore the performance of LLaMA3 series models under existing post-training quantization and LoRA-finetuning methods.
  • Point out the significant performance loss of MLLMs based on LLaMA3 under low-bit post-training quantization.
  • Highlights the significant performance gap under low bit-width that needs to be bridged in future developments.
[paper] [code] [abstract]
ICML 2024
sym

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs sym

Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi

  • Compress LLM weights to as low as 1.08-1.1 bit and exceeds the performance of previous quantization methods at 2-bit or even 3-bit.
  • Implements high-performance binary LLM in PTQ mode, efficiently achieving 1bit LLM compression without additional training and backpropagation.
[paper] [code] [abstract]
Arxiv
sym

On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks

Wei Huang, Haotong Qin, Yangdong Liu, Jingzhuo Liang, Yulun Zhang, Ying Li, Xianglong Liu

  • Combine IP-core-level chip runtime clock and power awareness with network sensitivity, achieving a better balance of computational efficiency and accuracy on edge devices.
  • Allow target networks to be compressed and deployed with high accuracy on edge chips with limited computational resources and ultra-low power consumption.
  • Efficiently perform online quantization and optimization without additional devices or data access.
[paper] [abstract]
Arxiv
sym

VLSNR:Vision-Linguistics Coordination Time Sequence-aware News Recommendation sym

Songhao Han*, Wei Huang*, Xiaotian Luan *

  • Construct a large scale multimodal news recommendation dataset V-MIND. It helps facilitate the future research of news recommendations in multimodal domain and improve the learning resultsof VLSNR.
  • Integrates visual and textual information about news in time series to learn click-preferences trend.
[paper] [code] [abstract]

📖 Educations

  • 2023.09 - (now), Ph.D. Student in Department of Electrical Electronic Engineering, The University of HongKong.
  • 2019.09 - 2023.06, B.Eng. in Computer Science, School of Computer Science and Engineering, Beihang University.

🗒️ Academic Services

  • Conference: ICLR, Neurips, ICML, ECCV, AISTATS
  • Journal: Neural Networks.
  • Program Committee member for Practical Deep Learning Workshop, IEEE CAI 2024.

🎖 Honors and Awards

  • 2019-2023(B.Eng.): Outstanding Graduate, Beihang University (2023). Outstanding Project of the 16th National College Student Innovation and Entrepreneurship Competition, China (2023). Outstanding Project of the 15th National College Student Innovation and Entrepreneurship Competition, China (2022). Second-class of the Social Practice Scholarship, Beihang University (2022). Third-class of the Innovation and Entreprenuership Scholarship, Beihang University (2021). Second-class of the Subject Competition Scholarship, Beihang University (2022), 3rd Prize of the 32st “Feng Ru Cup” Competition (2022). Second-class scholarship, Beihang University (2022). 3rd “Lan Qiao Cup” programming competation(Python), Beijing (2022). Second-class of the Social Practice Scholarship, Beihang University (2021). Second-class of the Subject Competition Scholarship, Beihang University (2021). Outstanding Teaching Assistant, Beihang University (2021). 2nd Prize of the 31st “Feng Ru Cup” Competition (2020). First-class scholarship, Beihang University (2020).

💻 Internships & Teaching Services

  • 2022.09 - 2023.01, AI algorithm internship on model inference acceleration, Enflame, China.
  • 2022.08 - 2023.01, TA for Frontiers in Artificial Intelligence, Beihang University.
  • 2022.08 - 2023.01, TA for Computer Hardware Basics, the head of TA team, Beihang University.
  • 2021.08 - 2022.01, TA for Computer Hardware Basics, the head of TA team, Beihang University.
  • 2021.03 - 2021.06, TA for Discrete Mathematics, the head of TA team, Beihang University.