TL00053 "Giant Language Model Test Room (GLTR)"

URL: http://gltr.io

External group:

Category:

Disinformation use: Designed for disinfo

Cogseccollab use:

Function: text forensics

Code_url:

Artifacts: text

Automation:

Platform:

Accessibility:

Summary: The aim of GLTR is to take the same models that are used to generated fake text as a tool for detection. GLTR has access to the GPT-2 117M language model from OpenAI, one of the largest publicly available models. It can use any textual input and analyze what GPT-2 would have predicted at each position. Since the output is a ranking of all of the words that the model knows, we can compute how the observed following word ranks. We use this positional information to overlay a colored mask over the text that corresponds to the position in the ranking. A word that ranks within the most likely words is highlighted in green (top 10), yellow (top 100), red (top 1,000), and the rest of the words in purple. Thus, we can get a direct visual indication of how likely each word was under the model.