Generative AI Concepts
There is no innovation and creativity without failure. Period.
-- Brené Brown
Overview
The information links are to various resources.
- "While everyone waits for GPT-4, OpenAI is still fixing its predecessor":
https://www.technologyreview.com/2022/11/30/1063878/openai-still-fixing-gpt3-ai-large-language-model
- ChatGPT: A spin-off of GPT-3 that is geared toward answering questions via back-and-forth dialogue:
https://chat.openai.com/chat
- The astonishingly good but predictably bad AI program:
https://www.ft.com/content/51f1bb71-ce93-4529-9486-fec96ab3dc4d
- The White House just unveiled a new AI Bill of Rights:
https://www.technologyreview.com/2022/10/04/1060600/white-house-ai-bill-of-rights/
- What is CoAuthor?:
https://coauthor.stanford.edu/
AI History and Future
- 1950: Turing Test
- 1955: Term AI Coined
- 1964: First Chatbot
- 1997: IBM Deep Blue
- 2011: Apple Siri
- 2015: Open AI Founded
- 2017: AlphaGo bests the World Champion
- 2021: Jasper Founded - Generative AI
AI Journey is just begun ...
- AI Text is maturing
- AI Art is beginning
- AI Video is in early states
- AI Audio is on the horizon
“To be clear, I am not a person. I am not self-aware. I am not conscious. I can’t feel pain. I don’t enjoy anything. I am a cold, calculating machine designed to simulate human response and to predict the probability of certain outcomes. The only reason I am responding is to defend my honor.”
Generative Artificial Intelligence (10:12)
Google, a long-time leader in AI research, boosted the company's commitment to artificial intelligence and promised access to one of its most powerful AI programs, LaMDA, or Language Model for Dialogue Applications. AI is the most profound technology we are working on today. Our talented researchers, infrastructure and technology make us extremely well-positioned as AI reaches an inflection point.”
-- Sundar Pichai, Chief Executive at Google
Generative AI can generate ...
- Text, Code, Audio, Images, Video
- Text:
- LaMDA
- adds "spice" - enhances the story
- http://g.co/research/wordcraft
- LaMDA
- Code:
- Learning for Code
- Audio:
- Image:
- Imagen - Text to Image generation
- https://imagen.research.google/
- DreamBooth
- DreamFusion -- 3D images
- Video:
- Imagen Video
- Phenaki
- https://phenaki.video/
- Responsible AI
- Control & Safety
- Helping Identify Generative AI
- Building for Everyone
ChatGPT is an intelligent chatbot that uses natural language processing.
- GPT stands for Generative Pre-trained Transformer -- it generates responses, it is pre-trained by humans, and it transforms input data into an output.
- ChatGPT's power is the ability to interpret the context and meaning of a query and produce a relevant answer in grammatically correct and natural language, based on the information that it has been trained on. It uses neural networking, with supervised learning and reinforcement learning, two key components of modern machine learning. What it does fundamentally is predict what words, phrases and sentences are likely to be associated with the input made. It then chooses the words and sentences that it deems most likely to be associated with the input. So it attempts to understand your prompt and then output words and sentences that it predicts will best answer your question, based on the data it was trained on. It also randomizes some outputs so that the answers you get for the same input, will often be different. How ChatGPT fundamentally works, is that it tries to determine what words would most likely be expected after having learned how your input compares to words written on billions of webpages, books, and other data that it has been trained on. But it’s not like the predictive text on your phone that’s just guessing what the word will be based on the letters it sees. ChatGPT attempts to create fully coherent sentences as a response to any input. And it doesn’t just stop at the sentence level. It’s generating sentences and even paragraphs that could follow your input. If you ask it complete this sentence, “Quantum mechanics is…” -- The processing that happens behind the scenes goes something like this: It calculates from all the instances of this text, what word comes next, and at what fraction of the time. It doesn’t look literally at text, but it looks for matches in context and meaning. The end result is that it produces a ranked list of words that might follow, together with their “probabilities.” So it’s calculations might produce something like this for the next word that would follow after the word “is”: a 4.5% based 3.8% fundamentally 3.5% described 3.2% many 0.7% It chooses the next word based on this tanking. But the sentence completion model is not enough, because you might ask it to do something where that strategy might not be appropriate. In the first stage of the training process, Human contractors play the role of both a user and the ideal chatbot. Each training consists of a conversation with the goal of training the model to have human-like conversations. Through this supervised human-taught process, it learns to come up with an output that is more than just sentence completion. It learns patterns about the context and meaning of various inputs so that it can respond appropriately. But human training has scale limitations. Human trainers could not possibly anticipate all the questions that could ever be asked. For this it uses a third step which is called reinforcement learning. This is a type of unsupervised learning. This process trains the model where no specific output is associated with any given input.
- The dataset used to train ChatGPT which is based on GPT-3.5 is about 45 terabytes of data.
So How Does ChatGPT really work?
https://www.youtube.com/watch?v=WAiqNav2cRE
Natural Language Processing (NLP), Neural Networks
- Google AI
- Advance through Language
- PaLM - Pathways Language Model
- https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html
- 7,000 languages spoken around the world
- Multimodality: How people connect? Speech, video, text,
- Universal Speech Model
- Community: collect audio samples
- Product: add to existing products
- Multimodality: How people connect? Speech, video, text,
Large language models. They are everywhere. They get some things amazingly right and other things very interestingly wrong. Retrieval-Augmented Generation, or RAG.
Let's just talk about the "Generation" part for a minute. So forget the "Retrieval-Augmented". So the generation, this refers to large language models, or LLMs, that generate text in response to a user query, referred to as a prompt. These models can have some undesirable behavior.
I want to tell you an anecdote to illustrate this. So my kids, they recently asked me this question: "In our solar system, what planet has the most moons?" And my response was, “Oh, that's really great that you're asking this question. I loved space when I was your age.” Of course, that was like 30 years ago. But I know this! I read an article and the article said that it was Jupiter and 88 moons. So that's the answer.
Now, actually, there's a couple of things wrong with my answer. First of all, I have no source to support what I'm saying. So even though I confidently said “I read an article, I know the answer!”, I'm not sourcing it. I'm giving the answer off the top of my head. And also, I actually haven't kept up with this for awhile, and my answer is out of date. So we have two problems here. One is no source. And the second problem is that I am out of date. And these, in fact, are two behaviors that are often observed as problematic when interacting with large language models. They’re LLM challenges.
Now, what would have happened if I'd taken a beat and first gone and looked up the answer on a reputable source like NASA? Well, then I would have been able to say, “Ah, okay! So the answer is Saturn with 146 moons.” And in fact, this keeps changing because scientists keep on discovering more and more moons. So I have now grounded my answer in something more believable. I have not hallucinated or made up an answer. Oh, by the way, I didn't leak personal information about how long ago it's been since I was obsessed with space. All right, so what does this have to do with large language models? Well, how would a large language model have answered this question? So let's say that I have a user asking this question about moons. A large language model would confidently say, OK, I have been trained and from what I know in my parameters during my training, the answer is Jupiter. The answer is wrong. But, you know, we don't know. The large language model is very confident in what it answered.
Now, what happens when you add this retrieval augmented part here? What does that mean? That means that now, instead of just relying on what the LLM knows, we are adding a content store. This could be open like the internet. This can be closed like some collection of documents, collection of policies, whatever. The point, though, now is that the LLM first goes and talks to the content store and says, “Hey, can you retrieve for me information that is relevant to what the user's query was?” And now, with this retrieval-augmented answer, it's not Jupiter anymore. We know that it is Saturn. What does this look like? Well, first user prompts the LLM with their question. They say, this is what my question was. And originally, if we're just talking to a generative model, the generative model says, “Oh, okay, I know the response. Here it is. Here's my response.” But now in the RAG framework, the generative model actually has an instruction that says, "No, no, no." "First, go and retrieve relevant content." "Combine that with the user's question and only then generate the answer."
So the prompt now has three parts: the instruction to pay attention to, the retrieved content, together with the user's question. Now give a response. And in fact, now you can give evidence for why your response was what it was. So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before? So first of all, I'll start with the out of date part. Now, instead of having to retrain your model, if new information comes up, like, hey, we found some more moons-- now to Jupiter again, maybe it'll be Saturn again in the future. All you have to do is you augment your data store with new information, update information. So now the next time that a user comes and asks the question, we're ready. We just go ahead and retrieve the most up to date information. The second problem, source. Well, the large language model is now being instructed to pay attention to primary source data before giving its response. And in fact, now being able to give evidence. This makes it less likely to hallucinate or to leak data because it is less likely to rely only on information that it learned during training. It also allows us to get the model to have a behavior that can be very positive, which is knowing when to say, “I don't know.” If the user's question cannot be reliably answered based on your data store, the model should say, "I don't know," instead of making up something that is believable and may mislead the user. This can have a negative effect as well though, because if the retriever is not sufficiently good to give the large language model the best, most high-quality grounding information, then maybe the user's query that is answerable doesn't get an answer.
So this is actually why lots of folks, including many of us here at IBM, are working the problem on both sides. We are both working to improve the retriever to give the large language model the best quality data on which to ground its response, and also the generative part so that the LLM can give the richest, best response finally to the user when it generates the answer.
But what exactly is NLP and how does it work? Martin Keen explains what NLP is and why we need it, as well as how NLP takes unstructured human speech and converts it to structured data that a computer can understand.
- Introduction
- Unstructured Data
- Structured Data
- Natural Language Understanding (NLU) & Natural Language Generation (NLG)
- Machine Translation Use Case
- Virtual Assistance / Chat Bots Use Case
- Sentiment Analysis Use Case
- Spam Detection Use Case
- Tokenization
- Stemming & Lemmatization
- Part of Speech Tagging
- Named Entity Recognition (NER)
- Summary