Transformers, the tech behind LLMs | Deep Learning Chapter 5

Breaking down how Large Language Models work, visualizing how data flows through. Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support --- Here are a few other relevant resources Build a GPT from scratch, by Andrej Karpathy https://youtu.be/kCc8FmEb1nY If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic: https://youtu.be/1il-s4mgNdI?si=XaVxj6bsdy3VkgEX If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources. https://transformer-circuits.pub/2021/framework/index.html History of language models by Brit Cruise, @ArtOfTheProblem https://youtu.be/OFS90-FX6pg An early paper on how directions in embedding spaces have meaning: https://arxiv.org/pdf/1301.3781.pdf Звуковая дорожка на русском языке: Влад Бурмистров. --- Timestamps 0:00 - Predict, sample, repeat 3:03 - Inside a transformer 6:36 - Chapter layout 7:20 - The premise of Deep Learning 12:27 - Word embeddings 18:25 - Embeddings beyond words 20:22 - Unembedding 22:22 - Softmax with temperature 26:03 - Up next

7.8M views

194.8K likes

Language

Format

Options

Skip Sponsors

Transcript

English

4887 words

27718 chars

25 min read

The initials GPT stand for Generative Pretrained Transformer. So that first word is straightforward enough, these are bots that generate new text. Pretrained refers to how the model went through a process of learning from a massive amount of data, and the prefix insinuates that there's more room to fine-tune it on specific tasks with additional training. But the last word, that's the real key piece. A transformer is a specific kind of neural network, a machine learning model, and it's the core invention underlying the current boom in AI. What I want to do with this video and the following chapters is go through a visually-driven explanation for what actually happens inside a transformer. We're going to follow the data that flows through it and go step by step. There are many different kinds of models that you can build using transformers. Some models take in audio and produce a transcript. This sentence comes from a model going the other way around, producing synthetic speech just from text. All those tools that took the world by storm in 2022 like DALL-E and Midjourney that take in a text description and produce an image are based on transformers. Even if I can't quite get it to understand what a pi creature is supposed to be, I'm still blown away that this kind of thing is even remotely possible. And the original transformer introduced in 2017 by Google was invented for the specific use case of translating text from one language into another. But the variant that you and I will focus on, which is the type that underlies tools like ChatGPT, will be a model that's trained to take in a piece of text, maybe even with some surrounding images or sound accompanying it, and produce a prediction for what comes next in the passage. That prediction takes the form of a probability distribution over many different chunks of text that might follow. At first glance, you might think that predicting the next word feels like a very different goal from generating new text....

More YouTube Tools

YouTube Video Tools

Free tools for YouTube video analysis

Get Another Transcript

Extract transcripts from any YouTube video

💡 Pro Tips for YouTube Transcripts

• Use transcripts to create study notes from educational videos
• Extract quotes for social media or research
• Convert video content to searchable text
• Create subtitles for accessibility

Transformers, the tech behind LLMs | Deep Learning Chapter 5

3Blue1Brown

7.8M views

194.8K likes

More YouTube Tools

💡 Pro Tips for YouTube Transcripts

• Use transcripts to create study notes from educational videos
• Extract quotes for social media or research
• Convert video content to searchable text
• Create subtitles for accessibility