Workshop
SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS
Souvik Kundu · Tianlong Chen · Shiwei Liu · Haizhong Zheng · Amir Yazdanbakhsh · Beidi Chen · Yingyan Celine Lin
Peridot 204-205
Sun 27 Apr, 5:30 p.m. PDT
In the rapidly evolving landscape of AI, the development of scalable optimization methods to yield efficient and adaptive foundation models has significant demand in the space of their inference service. In specific, enabling model efficiency while allowing them to be adaptable to various new down-stream tasks has multifold challenges. Firstly, the model’s ability to quickly learn adaptive and efficient sub-model selection on different tasks requires the capability to perform continual weight updates, compute- and memory-efficient fine-tuning, and personalized adaptation. Secondly, with the increased demand for long context understanding and reasoning, the model needs to yieldsuch efficient adaptation with the informative usefulness of the query-specific token fetching. For instance, imagine a model that continually learns from current news events, adapting to the everchanging global landscape by integrating up-to-date knowledge. Such models may not only need efficient fine-tuning to new incoming data stream, but also understand efficient handling of the KV cache that may keep on growing with the requirement to handle longer contextual information. Additionally, the integration of retrieval-augmented generation (RAG) into foundation models can ensure that generated content is not only relevant, but also reflects the most current knowledge while costing the prefill size to go up. Thirdly, with such growing demand for contextual adaptation, mixture of experts (MoE) models have also received significant traction that can perform test time adaptation via learned routing policy. In addition, the emergence of sub-quadratic models with constant KV states as opposed to KV caching of transformers, has opened up a new avenue of the model’s adaptation ability in the context of information retention into compressive KV states. These capabilities rely on techniques for adapting foundation models, including fine-tuning, conversion, distillation, and in-context/few-shot learning. This workshop aims to capture advances in scalable, adaptive fine-tuning, calibration, and conversion to yield inference efficient quadratic and sub-quadratic foundation models, focusing on methodologies across vision, language, and multi-modal domains.
Schedule
Sun 5:30 p.m. - 5:45 p.m.
|
Introduction
(
Opening remark by Organizers
)
>
link
SlidesLive Video |
Amir Yazdanbakhsh · Souvik Kundu · Shiwei Liu 🔗 |
Sun 5:45 p.m. - 6:15 p.m.
|
Invited talk 1
(
Invited Talk
)
>
SlidesLive Video |
Yu Cheng 🔗 |
Sun 6:15 p.m. - 6:45 p.m.
|
Invited talk 2
(
Invited Talk
)
>
SlidesLive Video |
Zechun Liu 🔗 |
Sun 6:45 p.m. - 7:15 p.m.
|
Invited talk 3
(
Invited Talk
)
>
SlidesLive Video |
Zhangyang Wang 🔗 |
Sun 7:15 p.m. - 7:30 p.m.
|
Coffee Break
|
🔗 |
Sun 7:30 p.m. - 8:00 p.m.
|
Workshop Spotlight Papers I
(
Oral Presentation Session
)
>
SlidesLive Video |
🔗 |
Sun 8:00 p.m. - 9:00 p.m.
|
Morning Poster Session
(
Poster
)
>
|
🔗 |
Sun 9:00 p.m. - 10:00 p.m.
|
Lunch Break
|
🔗 |
Sun 10:00 p.m. - 10:30 p.m.
|
Invited talk 4
(
Invited Talk
)
>
SlidesLive Video |
Zhuang Liu 🔗 |
Sun 10:30 p.m. - 11:00 p.m.
|
Invited talk 5
(
Invited Talk
)
>
SlidesLive Video |
Bryan Kian Hsiang Low 🔗 |
Sun 11:00 p.m. - 12:00 a.m.
|
Workshop Spotlight Papers II
(
Oral Presentation Session
)
>
SlidesLive Video |
🔗 |
Mon 12:00 a.m. - 12:15 a.m.
|
Coffee Break
|
🔗 |
Mon 12:15 a.m. - 12:45 a.m.
|
Invited talk 6
(
Invited Talk
)
>
SlidesLive Video |
Pavlo Molchanov 🔗 |
Mon 12:45 a.m. - 1:15 a.m.
|
Invited talk 7
(
Invited Talk
)
>
SlidesLive Video |
Ziwei Liu 🔗 |
Mon 1:15 a.m. - 1:45 a.m.
|
Panel Discussion
(
Panel
)
>
SlidesLive Video |
🔗 |
Mon 1:45 a.m. - 2:00 a.m.
|
Closing Remark
(
Closing Remark
)
>
|
Souvik Kundu · Shiwei Liu 🔗 |
Mon 2:00 a.m. - 3:00 a.m.
|
Afternoon Poster Session
(
Poster
)
>
|
🔗 |
-
|
The Curse of Depth in Large Language Models ( Poster ) > link | Wenfang Sun · Xinyuan Song · Pengxiang Li · Lu Yin · Yefeng Zheng · Shiwei Liu 🔗 |
-
|
SageAttention2: Efficient Attention with Smoothing Q and Per-thread Quantization ( Poster ) > link | Jintao Zhang · Haofeng Huang · Pengle Zhang · Jia wei · Jun Zhu · Jianfei Chen 🔗 |
-
|
SageAttention2: Efficient Attention with Smoothing Q and Per-thread Quantization ( Oral ) > link | Jintao Zhang · Haofeng Huang · Pengle Zhang · Jia wei · Jun Zhu · Jianfei Chen 🔗 |
-
|
Effortless Efficiency: Low-Cost Pruning of Diffusion Models ( Poster ) > link | Yang Zhang · Er Jin · Yanfei Dong · Ashkan Khakzar · Philip Torr · Johannes Stegmaier · Kenji Kawaguchi 🔗 |
-
|
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing ( Poster ) > link | Aviv Bick · Tobias Katsch · Nimit Sohoni · Arjun Desai · Albert Gu 🔗 |
-
|
M2R2: EFFICIENT TRANSFORMERS WITH MIXTURE OF MULTI-RATE RESIDUALS
(
Poster
)
>
link
SlidesLive Video |
Nikhil Bhendawade · Mahyar Najibi · Devang Naik · Irina Belousova 🔗 |
-
|
M2R2: EFFICIENT TRANSFORMERS WITH MIXTURE OF MULTI-RATE RESIDUALS ( Oral ) > link | Nikhil Bhendawade · Mahyar Najibi · Devang Naik · Irina Belousova 🔗 |
-
|
ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals ( Poster ) > link | Utkarsh Saxena · Sayeh Sharify · Kaushik Roy · Xin Wang 🔗 |
-
|
ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals
(
Oral
)
>
link
SlidesLive Video |
Utkarsh Saxena · Sayeh Sharify · Kaushik Roy · Xin Wang 🔗 |
-
|
UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices ( Poster ) > link | Seul-Ki Yeom · Tae-Ho Kim 🔗 |
-
|
OPPA: OPtimizing PArallelism for Language Model Training ( Poster ) > link | Apivich Hemachandra · Yizhan Han · See-Kiong Ng · Bryan Kian Hsiang Low 🔗 |
-
|
A Unified Approach to Routing and Cascading for LLMs ( Poster ) > link | Jasper Dekoninck · Maximilian Baader · Martin Vechev 🔗 |
-
|
Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2 ( Poster ) > link | Steven Abreu · Sumit Shrestha · Rui-Jie Zhu · Jason Eshraghian 🔗 |
-
|
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters ( Poster ) > link | Kevin Li · Sachin Goyal · João D Semedo · Zico Kolter 🔗 |
-
|
Fast Gradient Computation for RoPE Attention in Almost Linear Time ( Poster ) > link | Yifang Chen · Jiayan Huo · Xiaoyu Li · Yingyu Liang · Zhenmei Shi · Zhao Song 🔗 |
-
|
DARS : ROBUST SPARSE FINE-TUNING WITH REGULARIZED SUBSPACE DISALIGNMENT ( Poster ) > link | Sumin Park · Noseong Park 🔗 |
-
|
Towards Infinite-Long Prefix in Transformers ( Poster ) > link | Yingyu Liang · Zhenmei Shi · Zhao Song · Chiwun Yang 🔗 |
-
|
Towards Infinite-Long Prefix in Transformers ( Oral ) > link | Yingyu Liang · Zhenmei Shi · Zhao Song · Chiwun Yang 🔗 |
-
|
ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters ( Poster ) > link | Kamer Yuksel · Hassan Sawaf 🔗 |
-
|
FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models ( Poster ) > link | RAGHAV SINGHAL · Kaustubh Ponkshe · Praneeth Vepakomma 🔗 |
-
|
Graph Low-Rank Adapters of High Regularity for Graph Neural Networks and Graph Transformers ( Poster ) > link | Pantelis Papageorgiou · Haitz Sáez de Ocáriz Borde · Anastasis Kratsios · Michael Bronstein 🔗 |
-
|
In-batch Ensemble Drafting: Robust Speculative Decoding for LVLMs
(
Poster
)
>
link
SlidesLive Video |
Minjae Lee · Wonjun Kang · Byeongkeun Ahn · Christian Classen · Minghao Yan · Hyung Koo · Kangwook Lee 🔗 |
-
|
Yes, Q-learning Helps Offline In-Context RL ( Poster ) > link | Denis Tarasov · Alexander Nikulin · Ilya Zisman · Albina Klepach · Andrei Polubarov · Lyubaykin Nikita · Alexander Derevyagin · Igor Kiselev · Vladislav Kurenkov 🔗 |
-
|
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking ( Poster ) > link | Will LeVine · Bijan Varjavand 🔗 |
-
|
QMambaExtend: Improving Long-Context Extension of Memory-Efficient Mamba Models
(
Poster
)
>
link
SlidesLive Video |
Seyedarmin Azizi · Souvik Kundu · Mohammad Sadeghi · Massoud Pedram 🔗 |
-
|
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models ( Poster ) > link | Andy Zhou · Ron Arel 🔗 |
-
|
Low-Rank Continual Personalization of Diffusion Models ( Poster ) > link | Łukasz Staniszewski · Katarzyna Zaleska · Kamil Deja 🔗 |
-
|
Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention ( Poster ) > link | Zhendong Zhang 🔗 |
-
|
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning ( Poster ) > link | Kaustubh Ponkshe · RAGHAV SINGHAL · Eduard Gorbunov · Alexey Tumanov · Samuel Horváth · Praneeth Vepakomma 🔗 |
-
|
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations ( Poster ) > link | Sajad Movahedi · Felix Sarnthein · Nicola Muca Cirone · Antonio Orvieto 🔗 |
-
|
Domain-Invariant Prompt Learning for Vision-Language Models ( Poster ) > link | Arsham Gholamzadeh Khoee · Yinan Yu · Robert Feldt 🔗 |
-
|
AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting ( Poster ) > link | Abdelhakim Benechehab · Vasilii Feofanov · Giuseppe Paolo · Albert Thomas · Maurizio Filippone · Balázs Kégl 🔗 |
-
|
Universal LLM Routing with Correctness-Based Representation ( Poster ) > link | Wittawat Jitkrittum · Harikrishna Narasimhan · Ankit Singh Rawat · Jeevesh Juneja · Zifeng Wang · Chen-Yu Lee · Pradeep Shenoy · Rina Panigrahy · Aditya Krishna Menon · Sanjiv Kumar 🔗 |
-
|
Conformal Transformations for Symmetric Power Transformers ( Poster ) > link | Saurabh Kumar · Jacob Buckman · Carles Gelada · Xiaowen Zhang 🔗 |
-
|
SPAM: SPIKE-AWARE ADAM WITH MOMENTUM RESET FOR STABLE LLM TRAINING ( Poster ) > link | Tianjin Huang · Ziquan Zhu · Gaojie Jin · Lu Liu · Zhangyang Wang · Shiwei Liu 🔗 |
-
|
STIV: SCALABLE TEXT AND IMAGE CONDITIONED VIDEO GENERATION ( Poster ) > link |
17 presentersZongyu Lin · Wei Liu · Chen Chen · Jiasen Lu · Wenze Hu · Tsu-Jui Fu · Jesse Allardice · Zhengfeng Lai · Liangchen Song · Bowen Zhang · cha chen · Yiran Fei · Yifan Jiang · Lezhi Li · Yizhou Sun · Kai-Wei Chang · Yinfei Yang |
-
|
STIV: SCALABLE TEXT AND IMAGE CONDITIONED VIDEO GENERATION ( Oral ) > link |
17 presentersZongyu Lin · Wei Liu · Chen Chen · Jiasen Lu · Wenze Hu · Tsu-Jui Fu · Jesse Allardice · Zhengfeng Lai · Liangchen Song · Bowen Zhang · cha chen · Yiran Fei · Yifan Jiang · Lezhi Li · Yizhou Sun · Kai-Wei Chang · Yinfei Yang |
-
|
MixER: Better Mixture of Experts Routing for Hierarchical Meta-Learning ( Poster ) > link | Roussel Desmond Nzoyem · Grant Stevens · Amarpal Sahota · David Barton · Tom Deakin 🔗 |
-
|
N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs ( Poster ) > link | Ilya Zisman · Alexander Nikulin · Viacheslav Sinii · Denis Tarasov · Lyubaykin Nikita · Andrei Polubarov · Igor Kiselev · Vladislav Kurenkov 🔗 |
-
|
PENCIL: Long Thoughts with Short Memory ( Poster ) > link | Chenxiao Yang · Nathan Srebro · David McAllester · Zhiyuan Li 🔗 |
-
|
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference ( Poster ) > link | Jiwoo Hong · Sayak Paul · Noah Lee · Kashif Rasul · James Thorne · Jongheon Jeong 🔗 |
-
|
Efficient Distributed Optimization under Heavy-Tailed Noise ( Poster ) > link | Su Lee · Manzil Zaheer · Tian Li 🔗 |
-
|
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts ( Poster ) > link | Weigao Sun · Disen Lan · Tong Zhu · Xiaoye Qu · Yu Cheng 🔗 |
-
|
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts ( Oral ) > link | Weigao Sun · Disen Lan · Tong Zhu · Xiaoye Qu · Yu Cheng 🔗 |
-
|
Overtrained Language Models Are Harder to Fine-Tune ( Poster ) > link | Jacob Springer · Sachin Goyal · Kaiyue Wen · Tanishq Kumar · Xiang Yue · Sadhika Malladi · Graham Neubig · Aditi Raghunathan 🔗 |
-
|
Overtrained Language Models Are Harder to Fine-Tune ( Oral ) > link | Jacob Springer · Sachin Goyal · Kaiyue Wen · Tanishq Kumar · Xiang Yue · Sadhika Malladi · Graham Neubig · Aditi Raghunathan 🔗 |
-
|
Grams: Gradient Descent with Adaptive Momentum Scaling ( Poster ) > link | Yang Cao · Xiaoyu Li · Zhao Song 🔗 |
-
|
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam ( Poster ) > link |
11 presentersTianjin Huang · Haotian Hu · Zhenyu Zhang · Gaojie Jin · Xiang Li · Li Shen · Tianlong Chen · Lu Liu · Qingsong Wen · Zhangyang Wang · Shiwei Liu |
-
|
Efficient Open-set Test Time Adaptation of Vision Language Models ( Poster ) > link | Manogna Sreenivas · Soma Biswas 🔗 |
-
|
Enhanced Continual Learning of Vision-Language Models with Model Fusion
(
Poster
)
>
link
SlidesLive Video |
Haoyuan Gao · Zicong Zhang · Yuqi Wei · Linglan Zhao · Guilin Li · Yexin Li · Linghe Kong · Weiran Huang 🔗 |
-
|
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs ( Poster ) > link | Zhen Tan · Daize Dong · Xinyu Zhao · Jianing Cai · Jie Peng · Yu Cheng · Tianlong Chen 🔗 |
-
|
AsymLoRA: Unlocking the Power of Multimodal LLMs via Asymmetric LoRA
(
Poster
)
>
link
SlidesLive Video |
Xuyang Wei · Chunlin Tian · Li Li 🔗 |
-
|
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity ( Poster ) > link | Victor Weixin Liang · Junhong Shen · Genghan Zhang · Ning Dong · Luke Zettlemoyer · Lili Yu 🔗 |
-
|
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity ( Oral ) > link | Victor Weixin Liang · Junhong Shen · Genghan Zhang · Ning Dong · Luke Zettlemoyer · Lili Yu 🔗 |
-
|
Revisiting Associative Recall in Modern Recurrent Models ( Poster ) > link | Destiny Okpekpe · Antonio Orvieto 🔗 |
-
|
Layer Normalization Improves Length Generalization ( Poster ) > link | Ruining Li · Gabrijel Boduljak · Jinghao Zhou 🔗 |
-
|
KV Prediction for Improved Time to First Token ( Poster ) > link | Maxwell Horton · Qingqing Cao · Chenfan Sun · Yanzi Jin · Sachin Mehta · Mohammad Rastegari · Moin Nabi 🔗 |
-
|
Adaptive Length Image Tokenization via Recurrent Allocation ( Poster ) > link | Shivam Duggal · Phillip Isola · Antonio Torralba · William Freeman 🔗 |
-
|
RecurFormer: Not All Transformer Heads Need Self-Attention ( Poster ) > link | Ruiqing Yan · Linghan Zheng · Xingbo Du · Han Zou · Yufeng Guo · Jianfei Yang 🔗 |
-
|
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights ( Poster ) > link | Fenglu Hong · Ravi Raju · Jonathan Li · Bo Li · Urmish Thakker · Avinash Ravichandran · Swayambhoo Jain · Changran Hu 🔗 |
-
|
Context Is All You Need: Efficient Retrieval Augmented Generation for Domain Specific AI ( Poster ) > link | Peixi Xiong · Chaunte W. Lacewell · Sameh Gobriel · Nilesh Jain 🔗 |
-
|
XAMBA: Enabling Efficient State Space Models on Resource-Constrained Neural Processing Units ( Poster ) > link | Arghadip Das · Arnab Raha · Shamik Kundu · Soumendu Ghosh · Deepak Mathaikutty · Vijay Raghunathan 🔗 |
-
|
LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models ( Poster ) > link | Sihwan Park · Doohyuk Jang · Sung-Yub Kim · Souvik Kundu · Eunho Yang 🔗 |
-
|
LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
(
Oral
)
>
link
SlidesLive Video |
Sihwan Park · Doohyuk Jang · Sung-Yub Kim · Souvik Kundu · Eunho Yang 🔗 |
-
|
Attention Is All You Need For Mixture-of-Depths Routing ( Poster ) > link | Advait Gadhikar · Souptik Kumar Majumdar · Niclas Popp · Piyapat Saranrittichai · Martin Rapp · Lukas Schott 🔗 |