Experiment: how well can LLMs understand sarcasm? (Part 2)
How well can LLMs evaluate sarcasm?
Part two of "sarcasm and LLM" tosses the ball to LLMs to evaulate each other on how well they "understand" sarcasm.
Click the below to go back to part one:
Setup
Two LLMs, gemini-flash-2.5, and gpt-4.1-nano, are prompted to
carry out a conversation, and its response to sarcastic comments
are evaluated based on three criteria: clarity, relevance and naturalness.
How to test
Follow the instructions on this repo. It's quite fun to see LLMs evaluating each other. Have fun!