Chatbots: Who are you really talking to?

It is likely that, at some point, you have opened a website and saw a chat box pop up at the side of the page with a message from someone asking how they could assist you. Have you ever wondered who was on the other side of that conversation? Maybe it was a human or maybe it was a bot. How would you even tell the difference between the two? Traveling back in time to the middle of the 20th century may help answer this question.

In 1950, computer scientist Alan Turing published an article titled “Computing Machinery and Intelligence,” in which he first contemplated the question of whether machines can think. He discussed that this question was difficult to answer, and instead asked whether a computer could communicate in a manner almost identical to that of humans: Could a computer successfully make a person think they are speaking to another person? 

Turing tested this by creating the “imitation game,” also known as the Turing test. This game involves three players: A, B, and C. Either player A or B is a computer, and player C must determine which is which by asking them questions and evaluating how “human” each player’s answers sound. If the computer can successfully trick player C into thinking it is human, then it passes the Turing test.

Artificial Intelligence (AI) is used in chatbots, software that communicates with either humans or other bots to simulate human interaction. Chatbots undergo the Turing test to determine if they can behave like humans. An early example of this is ELIZA, a chatbot created at MIT in the 1960s by Joseph Weizenbaum. ELIZA was supposed to act as a psychotherapist for users to talk to about their issues. It crafted responses to users’ questions by pinpointing certain keywords or by rephrasing the question. Although ELIZA was very limited in what responses it could give and was not capable of in-depth conversation, many users still had a hard time realizing they were talking to a piece of technology. 

ELIZA was not supposed to pass the test since it cannot express basic emotions, yet users grew attached to ELIZA; they were confiding in the software as if it were their closest friend. People started associating ELIZA with human characteristics, leading to the coining of the term “ELIZA effect,” which describes the phenomenon where people associate AI with human characteristics. Weizenbaum warned people against letting machines make human choices, arguing that it was both wrong and dangerous to give computers such power. 

Now, some researchers think it is possible for GPT-4 to pass the Turing test since it can trick people into thinking they are speaking with a human within short conversations. In May 2023, a lab in Tel Aviv had 1.5 million players chat with either other humans or with a chatbot powered by Large Language Models — the same type of model as GPT-4 — that was built to mimic human behavior. Players identified bots around 60% of the time, which is not much better than chance. 

“Now, some researchers think it is possible for GPT-4 to pass the Turing test since it can trick people into thinking they are speaking with a human within short conversations.”

This means that the Turing test may not be the best way to test AI. In fact, Mustafa Suleyman, former head of applied AI at Google’s AI research lab DeepMind, expressed uncertainty regarding the effectiveness of the Turing test. In his book “The Coming Wave: Technology, Power, and the Twenty-first Century’s Greatest Dilemma,” he wrote, “It doesn’t tell us anything about what the system can do or understand, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence.”  

Alternatives to the Turing test exist: OpenAI, the group that created GPT-4, has its own set of benchmarks for AI tests that include coding, reading comprehension, and mathematics tests. GPT-4 has done well in these, too — it performed at around the 90th percentile on the bar exam and it performed at around the 80th percentile on the GRE. It is unclear whether these benchmarks are useful, but this new method could be more thorough than Turing’s. Regardless, it remains clear that it is time for new puzzles to test AI. 

Source(s):