Tweets Sexism Detection (NYCU-NLP)

By Joy Chrissetyo Prajogo • 2026-01-17

Overview

This project was an empirical study of three system implementations for sexism detection in tweets, submitted at worksing notes of CLEF 2025 (Madrid, Spain). We focused on an “Annotator-Aware” approach, realizing that sexism is often subjective and depends on who is labeling the data.

Methodologies

We implemented and compared three distinct architectures:

Fine-tuned Transformer-based: Utilized early and late fusion techniques.
Zero-shot Auto-Regressive (AR) LLM: Leveraging large language models without specific training examples.
Zero-shot Diffusion LLM: A novel approach using diffusion large language models for text classification.

Key Features

Two-Stage Pipeline: All systems followed a strict two-stage process to filter and then classify content.
Bilingual Fusion: We combined original tweets with cross-translated versions to capture linguistic nuances.
Demographic Integration: Uniquely integrated annotator demographics into the model to account for bias.

Tech Stack

Languages: Python
Frameworks: PyTorch
Models: Transformers, LLMs

Publication

Title: NYCU-NLP at EXIST 2025: An Empirical Study of Annotator-Aware Two-Stage Pipeline for Sexism Detection in Tweets
Authors: Joy Chrissetyo Prajogo, Lung-Hao Lee, and Hsien-I Lin
Conference: Working Notes of CLEF 2025, Vol 4038, pp. 2119-2132.