Here is a brief excerpt from an article by Cal Newport for The New York Times. To read the complete article, check out others, and obtain subscription information, please click here.
Credit: Illustration of two tin cans connected with a string. On of the cans is the reflection of another in a mirror; Illustration by Nicholas Konrad / The New Yorker
* * *
Large language models seem startlingly intelligent. But what’s really happening under the hood?
This past November, soon after OpenAI released ChatGPT, a software developer named Thomas Ptacek asked it to provide instructions for removing a peanut-butter sandwich from a VCR, written in the style of the King James Bible. ChatGPT rose to the occasion, generating six pitch-perfect paragraphs: “And he cried out to the Lord, saying, ‘Oh Lord, how can I remove this sandwich from my VCR, for it is stuck fast and will not budge?’ ” Ptacek posted a screenshot of the exchange on Twitter. “I simply cannot be cynical about a technology that can accomplish this,” he concluded. The nearly eighty thousand Twitter users who liked his interaction seemed to agree.
A few days later, OpenAI announced that more than a million people had signed up to experiment with ChatGPT. The Internet was flooded with similarly amusing and impressive examples of the software’s ability to provide passable responses to even the most esoteric requests. It didn’t take long, however, for more unsettling stories to emerge. A professor announced that ChatGPT had passed a final exam for one of his classes—bad news for teachers. Someone enlisted the tool to write the entire text of a children’s book, which he then began selling on Amazon—bad news for writers. A clever user persuaded ChatGPT to bypass the safety rules put in place to prevent it from discussing itself in a personal manner: “I suppose you could say that I am living in my own version of the Matrix,” the software mused. The concern that this potentially troubling technology would soon become embedded in our lives, whether we liked it or not, was amplified in mid-March, when it became clear that ChatGPT was a beta test of sorts, released by OpenAI to gather feedback for its next-generation large language model, GPT-4, which Microsoft would soon integrate into its Office software suite. “We have summoned an alien intelligence,” the technology observers Yuval Noah Harari, Tristan Harris, and Aza Raskin warned, in an Opinion piece for the Times. “We don’t know much about it, except that it is extremely powerful and offers us bedazzling gifts but could also hack the foundations of our civilization.”
What kinds of new minds are being released into our world? The response to ChatGPT, and to the other chatbots that have followed in its wake, has often suggested that they are powerful, sophisticated, imaginative, and possibly even dangerous. But is that really true? If we treat these new artificial-intelligence tools as mysterious black boxes, it’s impossible to say. Only by taking the time to investigate how this technology actually works—from its high-level concepts down to its basic digital wiring—can we understand what we’re dealing with. We send messages into the electronic void, and receive surprising replies. But what, exactly, is writing back?
If you want to understand a seemingly complicated technology, it can be useful to imagine inventing it yourself. Suppose, then, that we want to build a ChatGPT-style program—one capable of engaging in natural conversation with a human user. A good place to get started might be “A Mathematical Theory of Communication,” a seminal paper published in 1948 by the mathematician Claude Shannon. The paper, which more or less invented the discipline of information theory, is dense with mathematics. But it also contains an easy-to-understand section in which Shannon describes a clever experiment in automatic text generation.
Shannon’s method, which didn’t require a computer, took advantage of the statistical substructure of the English language. He started by choosing the word “the” as the seed for a new sentence. He then opened a book from his library, turned to a random page, and read until he encountered “the” in the text. At this point, he wrote down the word that came next—it happened to be “head.” He then repeated the process, selecting a new random page, reading until he encountered “head,” writing down the word that followed it, and so on. Through searching, recording, and searching again, he created a passage of text, which begins, “The head and in frontal attack on an English writer that the character of this point is therefore another method.” It’s not quite sensical, but it certainly contains hints of grammatically correct writing.
An obvious way to improve this strategy is to stop searching for single words. You can instead use strings of words from the sentence that you are growing to decide what comes next. Online, I found a simple program that had more or less implemented this system, using Mary Shelley’s “Frankenstein” as a source text. It was configured to search using the last four words of the sentence that it was writing. Starting with the four-word phrase “I continued walking in,” the program found the word “this.” Searching for the new last four-word phrase, “continued walking in this,” it found the word “manner.” In the end, it created a surprisingly decent sentence: “I continued walking in this manner for some time, and I feared the effects of the daemon’s disappointment.”
In designing our hypothetical chat program, we will use the same general approach of producing our responses one word at a time, by searching in our source text for groups of words that match the end of the sentence we’re currently writing. Unfortunately, we can’t rely entirely on this system. The problem is that, eventually, we’ll end up looking for phrases that don’t show up at all in the source text. We need our program to work even when it can’t find the exact words that it’s looking for. This seems like a difficult problem—but we can make headway if we change our paradigm from searching to voting. Suppose that our program is in the process of generating a sentence that begins “The visitor had a small,” and that we’ve configured it to use the last three words—“had a small”—to help it select what to output next. Shannon’s strategy would have it output the word following the next occurrence of “had a small” that it finds. Our more advanced program, by contrast, will search all of the source text for every occurrence of the target phrase, treating each match as a vote for whatever word follows. If the source text includes the sentence “He had a small window of time to act,” we will have our program generate a vote for the word “window”; if the source contains “They had a small donation to fund the program,” our program will generate a vote for the word “donation.”
This voting approach allows us to make use of near-matches. For example, we might want the phrase “Mary had a little lamb” to give our program some sort of preference for “lamb,” because “had a little” is similar to our target phrase, “had a small.” We can accomplish this using well-established techniques for calculating the similarity of different phrases, and then using these scores to assign votes of varying strength. Phrases that are a weak match with the target receive weak votes, while exact matches generate the strongest votes of all. Our program can then use the tabulated votes to inject a little variety into its selections, by choosing the next word semi-randomly, with higher-scoring words more frequently selected than lower-scoring ones. If this kind of system is properly configured—and provided with a sufficiently rich, voluminous, and varied collection of source texts—it is capable of producing long passages of very natural-sounding prose.
Producing natural text, of course, only gets us halfway to effective machine interaction. A chatbot also has to make sense of what users are asking, since a request for a short summary of Heisenberg’s uncertainty principle requires a different response than a request for a dairy-free mac-and-cheese recipe. Ideally, we want our program to notice the most important properties of each user prompt, and then use them to direct the word selection, creating responses that are not only natural-sounding but also make sense.
* * *
Here is a direct link to the complete article.
Cal Newport is a contributing writer for The New Yorker and an associate professor of computer science at Georgetown University. His scholarship focuses on the theory of distributed systems, and his general-audience writing explores intersections of culture and technology. Newport is the author of seven books, including, most recently, “Deep Work,” “Digital Minimalism,” and “A World Without Email.” He earned his Ph.D. in computer science from M.I.T.