Millions of people now have access to artificial intelligence like ChatGPT. After Apple Intelligence integrated ChatGPT into its platform, anyone with an iPhone, iPad or Mac can now ask complex questions without going to a separate app or website.
This long-awaited integration may spark questions like, how does ChatGPT work?
What are chatbots?
ChatGPT, operated by OpenAI, is an artificial intelligence chatbot like Google’s Gemini, Anthropic’s Claude, or Meta AI. These chatbots use a type of AI called a “large language model.” They understand text and generate words to sound human.
“It’s almost boring now to say this,” said Daniel Dugas, an AI and robotics scientist based in Switzerland. He wrote a visualized explanation of earlier AI GPT models. “The fact that I can talk to my computer and have a semi-coherent conversation is — it’s just unbelievable,”
“As an engineer, I immediately was pushed to the direction of, OK, how do we make something like intelligence?” Dugas said.
While large language models may seem intelligent, they essentially just predict the next word — much like a phone’s text suggestions.
But it’s far more complex.
How ChatGPT works
Large language models are trained on vast amounts of data, ranging from books to social media to much of the internet. An LLM maps out word relationships similar to the way the human brain does.
Take the sentence, “Don’t put all your eggs in one.” Once you enter it into an LLM and hit send, a lot of things happen in repetition — in a fraction of a second.
Step One: Tokenization and Encoding
Imagine the process like an assembly line. The first step on the assembly line is to turn the sentence into something computers can definitely understand: numbers.
RELATED STORY | How deepfake technology works
The sentence, “Don’t put all your eggs in one” is broken down into what’s called “token IDs” that vary depending on the AI model. The sentence now becomes [91418, 3006, 722, 634, 27226, 306, 1001]
You can test out tokenization using OpenAI’s tool.
Step Two: Embedding
Next, the resulting vector of numbers is expanded based on context.
For example, the word “egg” has a lot of different meanings and connotations. If you had to map out the word mathematically, one way is to plot it onto a graph between “chicken” and “young.” On a two-dimensional graph, that’s simple.
But “egg” has so many different meanings. “Egg” can be a part of an idiom, a breakfast ingredient, something associated with Easter, or a shape. Graphing this out would require multiple dimensions in a never-ending vector. We can’t imagine this, but a computer can compute it.
With the sentence “Don’t put all your eggs in one” the word egg might be [27226].
With the sentence “I ate an egg for breakfast” the word egg might be [16102]. It all depends on context. These contextual adjustments are based on all the training and the neural network of word relationships, and the changes are embedded into the vector.
Step Three: Transformer Architecture
The vector moves down the assembly line into a “transformer architecture.” It is a series of layers that make even more adjustments to the vector of numbers.
Based on the previous training, the AI has learned and decided what words carry more weight. For example, in the sentence “Don’t put all your eggs in one” the word “eggs” matters more than “one.”
Adjustments to the vector of numbers occur repeatedly to make sure context and meaning are close to everything it was trained on.
Step Four: Output
Finally, the result goes in reverse on the assembly line to turn a vector of numbers back into a word: basket.
"Don’t put all your eggs in one ... basket."
Is this advanced word prediction? Is this intelligence? Are there limits?
“You have papers saying, the model will never be able to create music or a model will never be able to answer a mathematical question,” Dugas said. “And they basically are crushed in the last five years.”
As large language models continue to advance, it’s important to keep up with what they can do and to know how we can work with them, not for them. Even a basic understanding will help people utilize, navigate, and legislate a technology some might consider revolutionary.