Experiment: how well can LLMs understand sarcasm? (Part 2)

How well can LLMs evaluate sarcasm?

Part two of "sarcasm and LLM" tosses the ball to LLMs to evaulate each other on how well they "understand" sarcasm.

Click the below to go back to part one:

Part 1: sarcasm and text-based LLMs

Setup

Two LLMs, gemini-flash-2.5, and gpt-4.1-nano, are prompted to carry out a conversation, and its response to sarcastic comments are evaluated based on three criteria: clarity, relevance and naturalness.

How to test

Follow the instructions on this repo. It's quite fun to see LLMs evaluating each other. Have fun!