ICLR UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS

Poster
in
Workshop: Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation

UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS

Alireza Dehghanpour Farashah · Aditi Khandelwal · Negar Rostamzadeh · Golnoosh Farnadi

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

As multilingual generative models become more widely used, most safety and fairness evaluation techniques still focus on English-language resources, while overlooking important cross-cultural factors. This limitation raises concerns about fairness and safety, particularly regarding geoculturally situated stereotypes that hinder the models’ global inclusivity. In this work, we present preliminary findings on the impact of stereotype unlearning across languages, specifically in English, French, and Hindi. Using an adapted version of the SeeGULL dataset, we analyze how unlearning stereotypes in one language influences other languages within multilingual large language models. Our study evaluates two model families, Llama-3.1-8B and Aya-Expanse-8B, to assess whether unlearning in one linguistic context transfers across languages, potentially mitigating or exacerbating biases in multilingual settings.

Chat is not available.

Poster in Workshop: Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation

UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS

Alireza Dehghanpour Farashah · Aditi Khandelwal · Negar Rostamzadeh · Golnoosh Farnadi

Poster
in
Workshop: Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation