A very short history of GPTs before ChatGPT

For many of us, artificial intelligence (AI) seemed to arrive with ChatGPT. But the field of AI is hardly new—the term itself is almost 70 years old—and the core technology that powers the generative AI software we use today is something University of ÌÇÐÄvlog¹ÙÍø staff and students have been working with well before ChatGPT burst onto the scene.

Most AI scientists trace ChatGPT’s origin to a landmark 2017 paper by eight researchers at Google titled (Vaswani et al., 2017), which proposed a novel AI architecture called the transformer. A key aspect of their approach was something called self-attention — a mechanism where the AI model weighs the importance of each word in a sentence, allowing it to capture context and dependency, with the transformer calculating these weights simultaneously and at rapid speed. For those of us who aren’t computer scientists, Wired’s Ìý is a fascinating read (Levy, 2017).

The transformer didn’t immediately spark an AI revolution. Google didn’t immediately see its potential, but a small startup named OpenAI, did.

Most people know about OpenAI from their first version of ChatGPT, which is based on an AI model called GPT-3.5 and released in November 2022. While the chatbot interface is what made ChatGPT immensely popular, OpenAI’s first model (GPT-1) was published in June 2018 and created great excitement among AI researchers, engi