For a few years now, computers have been getting better at generating text (and images). Since OpenAI (an Elon Musk–funded private artificial intelligence research lab) published their new GPT-3 (Generative Pre-trained Transformer-3) language model last year, hype around its capabilities has been growing, and many similar and even more powerful models have been created. Language models like this one are computer programs that can generate human-sounding text with very little input. The current state-of-the-art programs (like GTP-3) can create extremely detailed and on-topic outputs.
Language models like this one are computer programs that can generate human-sounding text with very little input.
Many of these models are available on a playground for journalists and researchers to try out. Most articles on this topic will have a quote or two that were created by this technology, and more often than not, the journalist will first say that it was said by a real person, only to later reveal their lie. This is usually followed by equal parts praise and skepticism at the computer and, sometimes, fear. This reaction makes sense; computer language models are now extremely good at mimicking human language. They can make full paragraphs and “answer” questions from human text input.
As highlighted by the now-famous “Stochastic Parrots” paper by Dr. Emily M. Bender and Dr. Timnit Gebru, these language models have several disadvantages, including encoding and spreading biases from their data and their expensive computing cost (and environmental impact) needed to create them. Each of these weaknesses poses some danger, the need for fair AI and environmentally conscious AI is being addressed in research communities such as the ACM Conference on Fairness, Accountability, and Transparency and the work by Roy Schwartz and colleagues on Green AI. However, the hype around these language models has pushed them to be used in products like GitHub’s Copilot, a software coding assistant, and many community-created demos on applying the technology to all kinds of new problems.
These language models work by going through millions of text data (mostly English) and codifying the words, forms, frequencies, and usage into numbers and probabilities, but they don’t codify their meanings. In their most basic form, these models simply predict the next word in a sentence; then, they chain together several predictions to generate arbitrarily long text. When we read this text, we imbue the text with coherence and intention, but this is merely an illusion.
In their most basic form, these models simply predict the next word in a sentence; then, they chain together several predictions to generate arbitrarily long text.
Some people argue that we just need to insert more parameters and data into the models, and eventually they will learn enough to be used as General AI, computer programs that can solve any problem with minimal reconfiguration or reprogramming. It might be possible to use this to create a system that will pass the Turing Test, which tests whether people can distinguish between a computer and a human actor. However, a computer that seems to be human does not necessarily think on its own. If fed with enough data, it might just repeat something it has already seen that was said by an actual human (in the AI field this is known as overfitting).
In many applications, the use of computer-generated text is not clearly displayed to the user, and they might think a human is behind the conversation. The text might explain things and show logical thinking but purely by chance. The output of the models is unpredictable; it might change topics randomly, or it might lead a conversation towards topics it is biased towards, possibly causing harm (such as furthering biases and discrimination against protected groups).
General AI might be on the horizon, since some breakthroughs on more complex designs for language models seem to give the computer the ability to make decisions. Yet, what we currently have is a bad approximation of a kid repeating words they heard on TV without understanding their meaning: The computer’s output is synthesized text that we, as humans, can interpret and give meaning to, but the computer doesn’t understand what it is saying or the meaning behind any of the words and sentences it parrots out.
Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021). DOI: 10.1145/3442188.3445922
Communications of the ACM (2020). DOI: 10.1145/3381831