“…he listened eagerly to the story of her life and she was equally eager to hear the story of his, but although they had a clear understanding of the logical meaning of the words they exchanged, they failed to hear the semantic murmurs of the river flowing through them.”
A compelling exploration of conversational disconnect comes from Milan Kundera’s “The Unbearable Lightness of Being.” In it, he describes the semantic gap between two lovers. Their experiences led each to make strong associations with words and concepts. The differences in these associations drive their dialogue apart, as if they speak different languages altogether. Like Kundera’s lovers, we accumulate experiences which individuate how we bind memories to words. Maybe for you, “basketball” evokes memories of the NBA playoff run you watched with friends. For another, it evokes the shame of an air-balled free throw in his first middle-school game. These ingrained associations give language its poetic flavor, but they are also baggage clouding our mutual comprehension, preventing us from being fully understood.
Many great thinkers have attempted to quantify and clarify the meaning of speech. Some have argued that language can only be understood as a biological phenomenon. Others argue that trying to define meaning creates a never ending chain of questions. Recent advancements in large language models (LLMs) have brought these arguments into speculation. Google Translate, Microsoft CoPilot, and ChatGPT are well-known names that convert our seemingly nebulous word associations into good-old math. LLMs analyze how often words appear together in text, converting each word into a long list of numbers — a vector — that places it in a high-dimensional space relative to other words known as an embedding space. Converting words to numbers lets us translate and generate coherent text and speech. As this technology matured, we saw massive adoption of language models for writing assistance. A 2023 survey found that 56% of college students reported using AI to assist in their coursework. Emphasis on “reported” — since use of these tools is stigmatized, actual rates are likely higher.
“These ingrained associations give language its poetic flavor, but they are also baggage clouding our mutual comprehension, preventing us from being fully understood.”
In contrast to previous beliefs about language, the word relations within embedding spaces have arithmetic and geometric properties. In a famous study, researchers found that by subtracting the vector for “man” from “king” and adding the vector for “woman,” one arrived at a vector most closely approximating that for “queen.” Since then, many interesting artifacts were discovered in embedding space, including representations of functional relationships like “x is the capital of y” and the ability to delete whole concepts from an embedding space while leaving remaining relations intact. This is significant because it highlights the underlying structure and regularity of meaning within language, suggesting that these spaces encode not just word associations but deeper conceptual relationships.
Despite mass-adoption, there is still a critical issue with language models today — their text comes across as robotic. These models are typically trained by processing terabytes of internet data. This creates a generality that is ideal when training a language model to understand the semantics of common English. However, the relations it learns are different from those of any individual vocabulary. Whereas human semantics have various idiosyncrasies shaped by nuanced life experiences, broad sampling of internet text leads to a model without distinction — the very thing which gives language its humanity. In most respects this is desirable, because it creates a foundational embedding space that can be expanded upon.
The process of fine-tuning lets us augment these embedding spaces with new data. We take an existing LLM and train it on additional text to adapt its representations and better reflect the co-occurrence of words in the new data. In this way, we can fine-tune the representations of language models to reflect not just English broadly, but our personal style of language, the semantics we’ve learned over time which act as a unique cognitive fingerprint. This underlying structure that LLM training extracts from language reveals information that can greatly aid our ability to connect and communicate. As we shift from standard LLMs to more personalized models, fine-tuned to reflect the nuances of our own language use, we’ll go beyond just making machines understand us better — we’ll understand each other better too.
- International Conference on Learning Representations (2024). DOI: 10.48550/arXiv.2310.15213
- IEEE/CVF Winter Conference on Applications of Computer Vision (2023). DOI: 10.48550/arXiv.2308.14761
- Association for Computational Linguistics (2016). DOI: 10.48550/arXiv.1509.01692
- Advances in Neural Information Processing Systems (2013). DOI: 10.48550/arXiv.1310.4546