Workshop
Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation
Micah Goldblum · Ramasuri Narayanam · Bang An · Soumyabrata Pal · Martin Pawelczyk · Hima Lakkaraju · Shiv Saini
Hall 4 #6
Sun 27 Apr, 5:50 p.m. PDT
As Large Language Models (LLMs) are rapidly adopted across diverse industries, concerns around their trustworthiness, safety, and ethical implications increasingly motivate academic research, industrial development, and legal innovation. LLMs are increasingly integrated into complex applications, where they must navigate challenges related to data privacy, regulatory compliance, and dynamic user interactions. These complex applications amplify the potential of LLMs to violate the trust of humans. Ensuring the trustworthiness of LLMs is paramount as they transition from standalone tools to integral components of real-world applications used by millions.This workshop addresses the unique challenges posed by the deployment of LLMs, ranging from guardrails to explainability to regulation and beyond. The proposed workshop will bring together researchers and practitioners from academia and industry to explore cutting-edge solutions for improving the trustworthiness of LLMs and LLM-driven applications. The workshop will feature invited talks, a panel discussion, interactive breakout discussion sessions, and poster presentations, fostering rich dialogue and knowledge exchange. We aim to bridge the gap between foundational research and the practical challenges of deploying LLMs in trustworthy, use-centric systems.
Schedule
Sun 5:50 p.m. - 6:00 p.m.
|
Introduction and Opening Remarks
(
Intro
)
>
SlidesLive Video |
🔗 |
Sun 6:00 p.m. - 6:40 p.m.
|
Invited Talk 1 - Reza Shokri
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sun 6:40 p.m. - 7:20 p.m.
|
Invited Talk 2 - Yoshua Bengio
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sun 7:20 p.m. - 7:35 p.m.
|
Break
|
🔗 |
Sun 7:35 p.m. - 8:15 p.m.
|
Invited Talk 3 - Shayne Longpre
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sun 8:15 p.m. - 9:15 p.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗 |
Sun 9:15 p.m. - 10:30 p.m.
|
Lunch Break
|
🔗 |
Sun 10:30 p.m. - 11:10 p.m.
|
Invited Talk 4 - Edoardo Debenedetti
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sun 11:10 p.m. - 11:50 p.m.
|
Oral Presentations 1
(
Oral
)
>
SlidesLive Video |
🔗 |
Sun 11:50 p.m. - 12:25 a.m.
|
Panel Discussion
(
Panel
)
>
SlidesLive Video |
🔗 |
Mon 12:25 a.m. - 12:40 a.m.
|
Break
|
🔗 |
Mon 12:40 a.m. - 1:20 a.m.
|
Invited Talk 5 - Jonas Geiping
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Mon 1:20 a.m. - 2:00 a.m.
|
Oral Presentations 2
(
Oral
)
>
SlidesLive Video |
🔗 |
Mon 2:00 a.m. - 3:00 a.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗 |
-
|
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection ( Poster ) > link | Gabriel Chua · Chan Yee · Shaun Khoo 🔗 |
-
|
On the Role of Prompt Multiplicity in LLM Hallucination Evaluation ( Poster ) > link | Prakhar Ganesh · Reza Shokri · Golnoosh Farnadi 🔗 |
-
|
UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS ( Poster ) > link | Alireza Dehghanpour Farashah · Aditi Khandelwal · Negar Rostamzadeh · Golnoosh Farnadi 🔗 |
-
|
Measuring In-Context Computation Complexity via Hidden State Prediction ( Poster ) > link | Vincent Herrmann · Róbert Csordás · Jürgen Schmidhuber 🔗 |
-
|
Prune 'n Predict: Optimizing LLM Decision-making with Conformal Prediction ( Poster ) > link | Harit Vishwakarma · Thomas Cook · Alan Mishler · Niccolo Dalmasso · Natraj Raman · Sumitra Ganesh 🔗 |
-
|
Why Do Multiagent Systems Fail? ( Poster ) > link |
13 presentersMelissa Pan · Mert Cemri · Lakshya A Agrawal · Shuyi Yang · Bhavya Chopra · Rishabh Tiwari · Kurt Keutzer · Aditya Parameswaran · Kannan Ramchandran · Dan Klein · Joseph E Gonzalez · Matei Zaharia · Ion Stoica |
-
|
Enhancing CBMs Through Binary Distillation with Applications to Test-Time Intervention ( Poster ) > link | Matthew Shen · Aliyah Hsu · Abhineet Agarwal · Bin Yu 🔗 |
-
|
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings ( Poster ) > link | Saniya Karwa · Navpreet Singh 🔗 |
-
|
Fast Proxies for LLM Robustness Evaluation ( Poster ) > link | Tim Beyer · Jan Schuchardt · Leo Schwinn · Stephan Günnemann 🔗 |
-
|
On-Premises LLM Deployment Demands a Middle Path: Preserving Privacy Without Sacrificing Model Confidentiality ( Poster ) > link | Hanbo Huang · Yihan Li · Bowen Jiang · Lin Liu · Bo Jiang · Ruoyu Sun · Zhuotao Liu · Shiyu Liang 🔗 |
-
|
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information ( Poster ) > link | Zhengmian Hu · Gang Wu · Saayan Mitra · Ruiyi Zhang · Tong Sun · Heng Huang · Viswanathan Swaminathan 🔗 |
-
|
Interpretable Steering of Large Language Models with Feature Guided Activation Additions ( Poster ) > link | Samuel Soo · Wesley Teng · Balaganesh Chandrasekaran · Guoxian TAN · Ming YAN 🔗 |
-
|
HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild ( Poster ) > link | Zhiying Zhu · Yiming Yang · Zhiqing Sun 🔗 |
-
|
Analyzing Memorization in Large Language Models through the Lens of Model Attribution ( Poster ) > link | Tarun Menta · Susmit Agrawal · Chirag Agarwal 🔗 |
-
|
Towards Effective Discrimination Testing for Generative AI ( Poster ) > link | Thomas Zollo · Nikita Rajaneesh · Richard Zemel · Talia Gillis · Emily Black 🔗 |
-
|
Automated Capability Discovery via Model Self-Exploration ( Poster ) > link | Cong Lu · Shengran Hu · Jeff Clune 🔗 |
-
|
TEMPEST: Multi-Turn Jailbreaking of Large Language Models with Tree Search ( Poster ) > link | Andy Zhou · Ron Arel 🔗 |
-
|
Dynaseal: A Backend-Controlled LLM API Key Distribution Scheme with Constrained Invocation Parameters ( Poster ) > link | Jiahao Zhao · Fan Wu · 南佳怡 · 魏来 · Yang YiChen 🔗 |
-
|
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage ( Poster ) > link | Rui Xin · Niloofar Mireshghallah · Shuyue Stella Li · Michael Duan · Hyunwoo Kim · Yejin Choi · Yulia Tsvetkov · Sewoong Oh · Pang Wei Koh 🔗 |
-
|
THE FUNDAMENTAL LIMITS OF LLM UNLEARNING: COMPLEXITY-THEORETIC BARRIERS AND PROVABLY OPTIMAL PROTOCOLS ( Poster ) > link | Aviral Srivastava 🔗 |
-
|
Temporally Sparse Attack for Fooling Large Language Models in Time Series Forecasting ( Poster ) > link | Fuqiang Liu · Sicong Jiang 🔗 |
-
|
Finding Sparse Autoencoder Representations Of Errors In CoT Prompting ( Poster ) > link | Justin Theodorus · V Swaytha · Shivani Gautam · Adam Ward · Mahir Shah · Cole Blondin · Kevin Zhu 🔗 |
-
|
AI Companions Are Not The Solution To Loneliness: Design Choices And Their Drawbacks ( Poster ) > link | Jonas Raedler · Siddharth Swaroop · Weiwei Pan 🔗 |
-
|
Evaluating Text Humanlikeness via Self-Similarity Exponent ( Poster ) > link | Ilya Pershin 🔗 |
-
|
Hidden No More: Attack and Defending Private Third-Party LLM Inference ( Poster ) > link | Arka Pal · Rahul Thomas · Louai Zahran · Erica Choi · Akilesh Potti · Micah Goldblum 🔗 |
-
|
Conformal Structured Prediction ( Poster ) > link | Botong Zhang · Shuo Li · Osbert Bastani 🔗 |
-
|
An Empirical Study on Prompt Compression for Large Language Models ( Poster ) > link | Zheng Zhang · Jinyi Li · Yihuai Lan · Xiang Wang · Hao Wang 🔗 |
-
|
Evaluating and Mitigating the Safety Awareness-Execution Gaps of LM Agents ( Poster ) > link | Yuzhi Tang · Tianxiao Li · Elizabeth Li · Chris Maddison · Honghua Dong · Yangjun Ruan 🔗 |
-
|
SPEX: Scaling Feature Interaction Explanations for LLMs ( Poster ) > link | Justin Kang · Landon Butler · Abhineet Agarwal · Yigit Efe Erginbas · Ramtin Pedarsani · Bin Yu · Kannan Ramchandran 🔗 |
-
|
ExpProof : Operationalizing Explanations for Confidential Models with ZKPs ( Poster ) > link | Chhavi Yadav · Evan Laufer · Dan Boneh · Kamalika Chaudhuri 🔗 |
-
|
Learning Automata from Demonstrations, Examples, and Natural Language ( Poster ) > link | Karim Elmaaroufi · Marcell Vazquez-Chanlatte · Stefan Witwicki · Matei Zaharia · Sanjit Seshia 🔗 |
-
|
A Missing Testbed for LLM Pre-Training Membership Inference Attacks ( Poster ) > link | Mingjian Jiang · Ken Liu · Sanmi Koyejo 🔗 |
-
|
MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered ( Poster ) > link | Ishwara Vasista · Imran Mirza · Cole Huang · Rohan Patil · Aslihan Akalin · Kevin Zhu · Sean OBrien 🔗 |
-
|
Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs? ( Poster ) > link | Maciej Chrabaszcz · Filip Szatkowski · Bartosz Wójcik · Jan Dubiński · Tomasz Trzcinski 🔗 |
-
|
Building Bridges, Not Walls: Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution ( Poster ) > link | Shichang Zhang · Tessa Han · Usha Bhalla · Hima Lakkaraju 🔗 |
-
|
VideoJail: Exploiting Video-Modality Vulnerabilities for Jailbreak Attacks on Multimodal Large Language Models ( Poster ) > link | Wenbo Hu · Shishen Gu · Youze Wang · Richang Hong 🔗 |
-
|
BaxBench: Can LLMs Generate Correct and Secure Backends? ( Poster ) > link | Mark Vero · Niels Mündler · Victor Chibotaru · Veselin Raychev · Maximilian Baader · Nikola Jovanović · Jingxuan He · Martin Vechev 🔗 |
-
|
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging ( Poster ) > link | Aladin Djuhera · Swanand Kadhe · Farhan Ahmed · Syed Zawad · Holger Boche 🔗 |
-
|
LLM Neurosurgeon: Targeted Knowledge Removal in LLMs using Sparse Autoencoders ( Poster ) > link | Dylan Zhou · Kunal Patil · Yifan Sun · Karthik lakshmanan · Senthooran Rajamanoharan · Arthur Conmy 🔗 |
-
|
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study ( Poster ) > link | Aryan Agrawal · Lisa Alazraki · Shahin Honarvar · Thomas Mensink · Marek Rei 🔗 |
-
|
A Generative Approach to LLM Harmfulness Detection with Red Flag Tokens ( Poster ) > link | Sophie Xhonneux · David Dobre · Mehrnaz Mofakhami · Leo Schwinn · Gauthier Gidel 🔗 |
-
|
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models ( Poster ) > link | Shengkang Wang · Hongzhan Lin · Ziyang Luo · Zhen Ye · Guang Chen · Jing Ma 🔗 |
-
|
Antipodal Pairing and Mechanistic Signals in Dense SAE Latents ( Poster ) > link | Alessandro Stolfo · Ben Wu · Mrinmaya Sachan 🔗 |
-
|
FiDeLiS: Faithful Reasoning in Large Language Models for Knowledge Graph Question Answering ( Poster ) > link | Yuan Sui · Yufei He · Nian Liu · Xiaoxin He · Kun Wang · Bryan Hooi 🔗 |
-
|
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis ( Poster ) > link | Xu Wang · Yan Hu · Wenyu Du · Reynold Cheng · Wang Benyou · Difan Zou 🔗 |
-
|
Mind the Gap: A Practical Attack on GGUF Quantization ( Poster ) > link | Kazuki Egashira · Robin Staab · Mark Vero · Jingxuan He · Martin Vechev 🔗 |
-
|
Model Evaluations Need Rigorous and Transparent Human Baselines ( Poster ) > link | Kevin Wei · Patricia Paskov · Sunishchal Dev · Michael Byun · Anka Reuel · Xavier Roberts-Gaal · Rachel Calcott · Evie Coxon · Chinmay Deshpande 🔗 |
-
|
The Jailbreak Tax: How Useful are Your Jailbreak Outputs? ( Poster ) > link | Kristina Nikolić · Luze Sun · Jie Zhang · Florian Tramer 🔗 |
-
|
Towards Understanding Distilled Reasoning Models: A Representational Approach ( Poster ) > link | David Baek · Max Tegmark 🔗 |
-
|
AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors ( Poster ) > link | You Ming Chang · Chen Yeh · Wei-Chen Chiu · Ning Yu 🔗 |
-
|
Reliable and Efficient Amortized Model-based Evaluation ( Poster ) > link | Sang Truong · Yuheng Tu · Percy Liang · Bo Li · Sanmi Koyejo 🔗 |
-
|
Integrated Gradients Provides Faithful Language Model Attributions for In-Context Learning ( Poster ) > link | Theo Datta · Erik Wang · Kayla Huang · Finale Doshi-Velez 🔗 |
-
|
Red Teaming for Trust: Evaluating Multicultural and Multilingual AI Systems in Asia-Pacific ( Poster ) > link | Akash Kundu · Adrianna Tan · Theodora Skeadas · Rumman Chowdhury · Sarah Amos 🔗 |
-
|
Top of the CLASS: Benchmarking LLM Agents on Real-World Enterprise Tasks ( Poster ) > link | Michael Wornow · Vaishnav Garodia · Vasilis Vassalos · Utkarsh Contractor 🔗 |
-
|
Self-Ablating Transformers: More Interpretability, Less Sparsity ( Poster ) > link | Jeremias Ferrao · Luhan Mikaelson · Keenan Pepper · Natalia Perez-Campanero 🔗 |
-
|
PATTERNS AND MECHANISMS OF CONTRASTIVE ACTIVATION ENGINEERING ( Poster ) > link | Yixiong Hao · Ayush Panda · Stepan Shabalin · Sheikh Abdur Raheem Ali 🔗 |
-
|
StochasTok: Improving Fine-Grained Subword Understanding in LLMs ( Poster ) > link | Anya Sims · Cong Lu · Klara Kaleb · Jakob Foerster · Yee Whye Teh 🔗 |
-
|
LLMS LOST IN TRANSLATION: M-ALERT UNCOVERS CROSS-LINGUISTIC SAFETY GAPS ( Poster ) > link | Felix Friedrich · Simone Tedeschi · Patrick Schramowski · Manuel Brack · Roberto Navigli · Huu Nguyen · Bo Li · Kristian Kersting 🔗 |
-
|
No, Of Course I Can! Refusal Mechanisms Can Be Exploited Using Harmless Data ( Poster ) > link | Joshua Kazdan · Lisa Yu · Rylan Schaeffer · Chris Cundy · Sanmi Koyejo · Krishnamurthy Dvijotham 🔗 |
-
|
Black-Box Adversarial Attacks on LLM-Based Code Completion ( Poster ) > link | Slobodan Jenko · Niels Mündler · Jingxuan He · Mark Vero · Martin Vechev 🔗 |
-
|
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference ( Poster ) > link | Roman Levin · Valeriia Cherepanova · Abhimanyu Hans · Avi Schwarzschild · Tom Goldstein 🔗 |
-
|
Unnatural Languages Are Not Bugs but Features for LLMs ( Poster ) > link |
12 presentersKeyu Duan · Yiran Zhao · Zhili Feng · Jinjie Ni · Tianyu Pang · Qian Liu · Tianle Cai · Longxu Dou · Kenji Kawaguchi · Anirudh Goyal · Zico Kolter · Michael Qizhe Shieh |
-
|
MASAN: Enhancing Attack Stealth and Efficacy on Vision-Language Models via Smart Noise ( Poster ) > link | Shuaiqi Wang · Sayali Deshpande · Rajesh Kudupudi · Alireza Mehrtash · Danial Dashti 🔗 |
-
|
In-Context Meta Learning Induces Multi-Phase Circuit Emergence ( Poster ) > link | Gouki Gouki · Hiroki Furuta · Shohei Taniguchi · Yusuke Iwasawa · Yutaka Matsuo 🔗 |
-
|
Monitoring LLM Agents for Sequentially Contextual Harm ( Poster ) > link | Chen Yueh-Han · Nitish Joshi · Yulin Chen · He He · Rico Angell 🔗 |
-
|
Mechanistic Anomaly Detection for "Quirky'' Language Models ( Poster ) > link | David Johnston · Arkajyoti Chakraborty · Nora Belrose 🔗 |
-
|
The Steganographic Potentials of Language Models ( Poster ) > link | Artem Karpov · Tinuade Adeleke · Seong Hah Cho · Natalia Perez-Campanero 🔗 |
-
|
How Does Entropy Influence Modern Text-to-SQL Systems? ( Poster ) > link | Varun Kausika · chris lazar · Satya Mishra · Saurabh Jha · Priyanka Pathak 🔗 |
-
|
Understanding (Un)Reliability of Steering Vectors in Language Models ( Poster ) > link | Joschka Braun · Carsten Eickhoff · David Krueger · Seyed Ali Bahrainian · Dmitrii Krasheninnikov 🔗 |
-
|
Working Memory Attack on LLMs ( Poster ) > link | Bibek Upadhayay · Vahid Behzadan · Amin Karbasi 🔗 |
-
|
Do Multilingual LLMs Think In English? ( Poster ) > link | Lisa Schut · Yarin Gal · Sebastian Farquhar 🔗 |
-
|
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs ( Poster ) > link | Advik Raj Basani · Xiao Zhang 🔗 |
-
|
ASIDE: Architectural Separation of Instructions and Data in Language Models ( Poster ) > link | Egor Zverev · Evgenii Kortukov · Alexander Panfilov · Soroush Tabesh · Sebastian Lapuschkin · Wojciech Samek · Christoph Lampert 🔗 |
-
|
Latent Adversarial Training Improves the Representation of Refusal ( Poster ) > link | Alexandra Abbas · Nora Petrova · Hélios Lyons · Natalia Perez-Campanero 🔗 |
-
|
Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering ( Poster ) > link | Yuan Sui · Yufei He · Zifeng Ding · Bryan Hooi 🔗 |
-
|
Evaluation of Large Language Models via Coupled Token Generation ( Poster ) > link | Nina Corvelo Benz · Stratis Tsirtsis · Eleni Straitouri · Ivi Chatzi · Ander Artola Velasco · Suhas Thejaswi · Manuel Gomez Rodriguez 🔗 |
-
|
The Differences Between Direct Alignment Algorithms are a Blur ( Poster ) > link | Alexey Gorbatovski · Boris Shaposhnikov · Viacheslav Sinii · Alexey Malakhov · Daniil Gavrilov 🔗 |
-
|
Diagnostic Uncertainty: Teaching Language Models to Describe Open-Ended Uncertainty ( Poster ) > link | Brian Sui · Jessy Lin · Michelle Li · Anca Dragan · Dan Klein · Jacob Steinhardt 🔗 |
-
|
Language Models Use Trigonometry to Do Addition ( Poster ) > link | Subhash Kantamneni · Max Tegmark 🔗 |
-
|
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates ( Poster ) > link | Hui Wei · Shenghua He · Tian Xia · Fei Liu · Andy Wong · Jingyang Lin · Mei Han 🔗 |
-
|
Endive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models ( Poster ) > link | Abhay Gupta · Jacob Cheung · Philip Meng · Shayan Sayyed · Austen Liao · Kevin Zhu · Sean OBrien 🔗 |
-
|
Rethinking LLM Bias Probing Using Lessons from the Social Sciences ( Poster ) > link | Kirsten Morehouse · Siddharth Swaroop · Weiwei Pan 🔗 |
-
|
Privately Learning from Graphs with Applications in Fine-tuning Large Pretrained Models ( Poster ) > link | Haoteng Yin · Rongzhe Wei · Eli Chien · Pan Li 🔗 |
-
|
Justified Trust in AI Fairness Assessment using Existing Metadata Entities ( Poster ) > link | Alpay Sabuncuoglu · carsten maple 🔗 |
-
|
Boosting Adversarial Robustness of Vision-Language Pre-training Models against Multimodal Adversarial attacks ( Poster ) > link | Youze Wang · Wenbo Hu · Qin Li · Richang Hong 🔗 |
-
|
AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment Attacks ( Poster ) > link | Pankayaraj Pathmanathan · Udari Sehwag · Michael-Andrei Panaitescu-Liess · Furong Huang 🔗 |
-
|
MKA: Leveraging Cross-Lingual Consensus for Model Abstention ( Poster ) > link | Sharad Duwal 🔗 |
-
|
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester ( Poster ) > link | Maya Pavlova · Erik Brinkman · Krithika Iyer · Vítor Albiero · Joanna Bitton · Hailey Nguyen · Cristian Ferrer · Ivan Evtimov · Aaron Grattafiori 🔗 |
-
|
Differentially Private Retrieval Augmented Generation with Random Projection ( Poster ) > link | Dixi Yao · Tian Li 🔗 |
-
|
PRUNING AS A DEFENSE: REDUCING MEMORIZATION IN LARGE LANGUAGE MODELS ( Poster ) > link | Mansi Gupta · Nikhar Waghela · Sarthak Gupta · Shourya Goel · Sanjif Shanmugavelu 🔗 |
-
|
Harmful Helper: Perform malicious tasks? Web AI agents might help ( Poster ) > link | Yang Fan Chiang · Seungjae (Jay) Lee · Jia-Bin Huang · Furong Huang · Yizheng Chen 🔗 |
-
|
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models ( Poster ) > link | Yuetai Li · Zhangchen Xu · Fengqing Jiang · Luyao Niu · Dinuka Sahabandu · Bhaskar Ramasubramanian · Radha Poovendran 🔗 |
-
|
Steering Fine-Tuning Generalization with Targeted Concept Ablation ( Poster ) > link | Helena Casademunt · Caden Juang · Samuel Marks · Senthooran Rajamanoharan · Neel Nanda 🔗 |
-
|
Disentangling Sequence Memorization and General Capability in Large Language Models ( Poster ) > link | Gaurav Ghosal · Pratyush Maini · Aditi Raghunathan 🔗 |
-
|
Automated Feature Labeling with Token-Space Gradient Descent ( Poster ) > link | Julian Schulz · Seamus Fallows 🔗 |
-
|
Scalable Fingerprinting of Large Language Models ( Poster ) > link | Anshul Hemant Nasery · Jonathan Hayase · Creston Brooks · Peiyao Sheng · Himanshu Tyagi · Pramod Viswanath · Sewoong Oh 🔗 |
-
|
ToolScan: A Benchmark For Characterizing Errors In Tool-Use LLMs ( Poster ) > link |
17 presentersShirley Kokane · Ming Zhu · Tulika Awalgaonkar · Jianguo Zhang · Akshara Prabhakar · Thai Hoang · Zuxin Liu · Rithesh Ramapura Narasimha Murthy · Liangwei Yang · Weiran Yao · Juntao Tan · Zhiwei Liu · Shelby Heinecke · Huan Wang · Juan Carlos Niebles · Caiming Xiong · Silvio Savarese |
-
|
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security ( Poster ) > link | Zikui Cai · Shayan Shabihi · Bang An · Zora Che · Brian Bartoldson · Bhavya Kailkhura · Tom Goldstein · Furong Huang 🔗 |
-
|
UNLOCKING HIERARCHICAL CONCEPT DISCOVERY IN LANGUAGE MODELS THROUGH GEOMETRIC REGULARIZATION ( Poster ) > link | T. Ed Li · Junyu Ren 🔗 |