Workshop
Modular, Collaborative and Decentralized Deep Learning
Arthur Douillard · Haokun Liu · Wanru Zhao · Colin Raffel · Marco Ciccone · Prateek Yadav
Hall 4 #3
Sat 26 Apr, 5:50 p.m. PDT
The increasing complexity of modern machine learning models exposes the limitations of the traditional, monolithic approach to their development, raising concerns about cost and sustainability.This workshop challenges this approach by advocating for a new paradigm based on modular design and functional specialization. Inspired by principles from software engineering, we envision a future where models are composed of independently trainable modules, enabling asynchronous development, incremental updates, and cross-task generalization through composability. This shift towards modularity unlocks new possibilities for collaborative model development where researchers can contribute specialized modules, combine existing models, and participate in decentralized training schemes. By embracing modularity, we can democratize deep learning research, enabling smaller teams and institutions to contribute to the development of powerful and efficient models. Furthermore, modularity promises to enhance model interpretability, and maintainability, paving the way for more robust and efficient AI systems. This workshop aims to accelerate this transition towards a more collaborative and sustainable future for deep learning.
Schedule
Sat 5:50 p.m. - 6:00 p.m.
|
Opening Remarks
(
Introduction and Opening Remarks
)
>
SlidesLive Video |
Arthur Douillard 🔗 |
Sat 6:00 p.m. - 6:15 p.m.
|
How to Merge Multimodal Models Over Time?
(
Oral
)
>
SlidesLive Video |
Sebastian Dziadzio · Vishaal Udandarao · Karsten Roth · Ameya Prabhu · Zeynep Akata · Samuel Albanie · Matthias Bethge 🔗 |
Sat 6:15 p.m. - 6:30 p.m.
|
MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling
(
Oral
)
>
SlidesLive Video |
Rachel Teo · Tan Nguyen 🔗 |
Sat 6:30 p.m. - 7:00 p.m.
|
Keynote #1: Ahmet Üstün – Cohere For AI
(
Keynote
)
>
SlidesLive Video |
Ahmet Üstün 🔗 |
Sat 7:00 p.m. - 7:15 p.m.
|
Coffee Break ☕️
|
🔗 |
Sat 7:15 p.m. - 7:30 p.m.
|
Exploring Asynchronism in SWARM Parallelism
(
Oral
)
>
SlidesLive Video |
Yan Zuo · Gil Avraham · Thalaiyasingam Ajanthan · Sameera Ramasinghe · Alexander Long 🔗 |
Sat 7:30 p.m. - 7:45 p.m.
|
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
(
Oral
)
>
SlidesLive Video |
Pierre Ablin · Angelos Katharopoulos · Skyler Seto · David Grangier 🔗 |
Sat 7:45 p.m. - 8:15 p.m.
|
Keynote #2: Sneha Kudugunta – Google DeepMind
(
Keynote
)
>
SlidesLive Video |
Sneha Kudugunta 🔗 |
Sat 8:15 p.m. - 8:45 p.m.
|
Keynote #3: Olga Golevneva – Meta AI
(
Keynote
)
>
SlidesLive Video |
Olga Golovneva 🔗 |
Sat 8:45 p.m. - 9:30 p.m.
|
Poster session #1
(
Poster Session
)
>
|
🔗 |
Sat 9:30 p.m. - 11:00 p.m.
|
Lunch Break
|
🔗 |
Sat 11:00 p.m. - 11:15 p.m.
|
Improving the Efficiency of Distributed Training using Sparse Parameter Averaging
(
Oral
)
>
SlidesLive Video |
Matt Beton · Seth Howes · Alex Cheema · Mohamed Baioumy 🔗 |
Sat 11:15 p.m. - 11:30 p.m.
|
(
Oral
)
>
SlidesLive Video |
🔗 |
Sat 11:30 p.m. - 12:00 a.m.
|
Keynote #4: Max Ryabinin – Together AI
(
Keynote
)
>
SlidesLive Video |
Maksim Riabinin 🔗 |
Sun 12:00 a.m. - 12:15 a.m.
|
Coffee Break ☕️
|
🔗 |
Sun 12:15 a.m. - 1:00 a.m.
|
Poster session #2
(
Poster Session
)
>
|
🔗 |
Sun 1:00 a.m. - 1:30 a.m.
|
Keynote #5: Jonas Pfeiffer – Google DeepMind
(
Keynote
)
>
SlidesLive Video |
🔗 |
Sun 1:30 a.m. - 2:00 a.m.
|
Keynote #6: Sami Jaghouar – PrimeIntellect
(
Keynote
)
>
SlidesLive Video |
🔗 |
Sun 2:00 a.m. - 2:50 a.m.
|
Panel discussions
(
Panel
)
>
SlidesLive Video |
🔗 |
Sun 2:50 a.m. - 3:00 a.m.
|
Closing remarks
(
Closing Remarks
)
>
|
Arthur Douillard 🔗 |
-
|
Collective Model Intelligence Requires Compatible Specialization
(
Poster
)
>
|
Jyothish Pari · Samy Jelassi · Pulkit Agrawal 🔗 |
-
|
Training Plug n' Play Knowledge Modules with Deep Context Distillation
(
Poster
)
>
|
Lucas Caccia · Alan Ansell · Ivan Vulić · Edoardo M. Ponti · Alessandro Sordoni 🔗 |
-
|
Exact Unlearning of Finetuning Data via Model Merging at Scale
(
Poster
)
>
|
Kevin Kuo · Amrith Setlur · Kartik Srinivas · Aditi Raghunathan · Virginia Smith 🔗 |
-
|
A Framework for Double-Blind Federated Adaptation of Foundation Models
(
Poster
)
>
|
Nurbek Tastan · Karthik Nandakumar 🔗 |
-
|
Improving the Efficiency of Distributed Training using Sparse Parameter Averaging
(
Poster
)
>
|
Matt Beton · Seth Howes · Alex Cheema · Mohamed Baioumy 🔗 |
-
|
CAMEx: Curvature-aware Merging of Experts
(
Poster
)
>
|
Dung Viet Nguyen · Minh Nguyen · Luc Nguyen · Rachel Teo · Tan Nguyen · Duy Linh Tran 🔗 |
-
|
Multi-Agent Verification: Scaling Test-Time Compute with Goal Verifiers
(
Poster
)
>
|
Shalev Lifshitz · Sheila McIlraith · Yilun Du 🔗 |
-
|
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
(
Poster
)
>
|
Pierre Ablin · Angelos Katharopoulos · Skyler Seto · David Grangier 🔗 |
-
|
MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling
(
Poster
)
>
|
Rachel Teo · Tan Nguyen 🔗 |
-
|
Rethinking Decentralized Learning: Towards More Realistic Evaluations with a Metadata-Agnostic Approach
(
Poster
)
>
|
Tianyu Zhang · Lu Li · Tongtian Zhu · Suyuchen Wang · Can Wang · Yong Chen 🔗 |
-
|
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
(
Poster
)
>
|
Victor Weixin Liang · Lili Yu · Liang Luo · Srini Iyer · Ning Dong · Chunting Zhou · Gargi Ghosh · Mike Lewis · Luke Zettlemoyer · Victoria Lin 🔗 |
-
|
Federated Circuits: A Unified Framework for Scalable and Efficient Federated Learning
(
Poster
)
>
|
Jonas Seng · Florian Busch · Pooja Prasad · Devendra Singh Dhami · Martin Mundt · Kristian Kersting 🔗 |
-
|
ReM: Sparsify and MoEfy Models with Post-Hoc ReLU Modulation
(
Poster
)
>
|
Wenbo Zhang · Xiang Ren 🔗 |
-
|
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
(
Poster
)
>
|
Yilun Kong · Guozheng Ma · Qi Zhao · Haoyu Wang · Li Shen · Xueqian Wang · Dacheng Tao 🔗 |
-
|
BICEC: Attachable Classification-Based Intelligent Control for Sustainable Computer Vision Systems
(
Poster
)
>
|
Jonathan W Burton-Barr · Deepu Rajan · Basura Fernando 🔗 |
-
|
Conditioning on Local Statistics for Scalable Heterogeneous Federated Learning (Tiny Paper)
(
Poster
)
>
|
Rickard Nakamura Brännvall 🔗 |
-
|
Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning
(
Poster
)
>
|
Anthony Kobanda · Rémy Portelas · odalric-ambrym maillard · Ludovic Denoyer 🔗 |
-
|
An Empirical Study of Policy Interpolation via Diffusion Models
(
Poster
)
>
|
Yuqing Xie · Chao Yu · Ya Zhang · Yu Wang 🔗 |
-
|
Beyond Top-K: Structured Sparsification for Compression in Pipeline Parallel
(
Poster
)
>
|
Sameera Ramasinghe · Thalaiyasingam Ajanthan · Gil Avraham · Yan Zuo · Alexander Long 🔗 |
-
|
NoEsis: A Modular LLM with Differentially Private Knowledge Transfer
(
Poster
)
>
|
Rob Romijnders · Stefanos Laskaridis · Ali Shahin Shamsabadi · Hamed Haddadi 🔗 |
-
|
Truncate without Fear: Module Aggregation and Redistribution in Federated Low-Rank Adaptation
(
Poster
)
>
|
Chen · Yuxing Liu · Arindam Banerjee 🔗 |
-
|
On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
(
Poster
)
>
|
Dongyang Fan · Bettina Messmer · Nikita Doikov · Martin Jaggi 🔗 |
-
|
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
(
Poster
)
>
|
Rinon Gal · Adi Haviv · Yuval Alaluf · Amit Bermano · Daniel Cohen-Or · Gal Chechik 🔗 |
-
|
Efficient Distributed Optimization under Heavy-Tailed Noise
(
Poster
)
>
|
Su Lee · Manzil Zaheer · Tian Li 🔗 |
-
|
How to Merge Multimodal Models Over Time?
(
Poster
)
>
|
Sebastian Dziadzio · Vishaal Udandarao · Karsten Roth · Ameya Prabhu · Zeynep Akata · Samuel Albanie · Matthias Bethge 🔗 |
-
|
Revisiting Sparse Mixture of Experts for Resource-adaptive Federated Fine-tuning Foundation Models
(
Poster
)
>
|
Van-Tuan Tran · Le Khiem · Viet Pham 🔗 |
-
|
Exploring Asynchronism in SWARM Parallelism
(
Poster
)
>
|
Yan Zuo · Gil Avraham · Thalaiyasingam Ajanthan · Sameera Ramasinghe · Alexander Long 🔗 |
-
|
FedMoDN: Federated Modular Decision Support Networks
(
Poster
)
>
|
Cécile Trottet · Michael Krauthammer · Mary-Anne Hartley 🔗 |
-
|
ROBUST ONLINE INFERENCE USING ADAPTIVE MODEL SWITCHING
(
Poster
)
>
|
Kalpan Mukherjee · Vikramank Singh · Abishek Sankararaman · Balakrishnan Narayanaswamy · Tim Kraska 🔗 |
-
|
Adaptive Local Training in Federated Learning
(
Poster
)
>
|
Donald Shenaj · Eugene Belilovsky · Pietro Zanuttigh 🔗 |
-
|
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts
(
Poster
)
>
|
Samin Yeasar Arnob · Zhan Su · Minseon Kim · Oleksiy Ostapenko · Doina Precup · Lucas Caccia · Alessandro Sordoni 🔗 |
-
|
HDEE: Heterogeneous Domain Expert Ensemble
(
Poster
)
>
|
Oguzhan Ersoy · Jari Kolehmainen · Gabriel Andrade 🔗 |
-
|
Tight Clusters Make Specialized Experts
(
Poster
)
>
|
Stefan Nielsen · Rachel Teo · Laziz Abdullaev · Tan Nguyen 🔗 |
-
|
Momentum Look-Ahead for Asynchronous Distributed Low-Communication Training
(
Poster
)
>
|
Thalaiyasingam Ajanthan · Sameera Ramasinghe · Gil Avraham · Yan Zuo · Alexander Long 🔗 |
-
|
Disentangling Sequence Memorization and General Capability in Large Language Models
(
Poster
)
>
|
Gaurav Ghosal · Pratyush Maini · Aditi Raghunathan 🔗 |