1.58万字!2024 GTC黄仁勋完整版演讲全文+视频来了!
来源:数字开物
文章已获授权
北京时间2024年3月19日凌晨,英伟达创始人兼首席执行官黄仁勋在2024年GTC大会上进行了精彩的演讲,以下为演讲核心内容概述:
1. 加速计算的重要性: 黄仁勋强调了加速计算对于推动各行各业的数字化转型的作用,特别是在气候科技、工业仿真、生命科学和机器人领域。
他宣布了与Ansys、Synopsys、Cadence等公司的合作,加速了他们的生态系统,并将它们连接到Omniverse云(英伟达Omniverse本质是英伟达专为实时协作和仿真模拟打造的一站式工具集成平台)。
此外,还讨论了生成型AI在半导体制造中的重要性,以及为满足大型语言模型的计算需求而需要更大的GPU的必要性。
2.全新的Blackwell架构: 黄仁勋展示了Blackwell,一款为生成性人工智能而设计的新一代GPU平台,它拥有2080亿个晶体管,可以处理数万亿个参数的模型。
他还展示了Blackwell的两种系统配置,以及一些新的特性,如自适应的张量核心、高速的NVLink开关和安全的AI加密。Blackwell集成了安全AI能力,包括在静止和传输时的加密,以及高速压缩引擎。
该GPU旨在降低AI计算的成本和能耗,实现更大模型的训练和计算能力的扩展。它拥有强劲的算了性能,与Hopper的比较,由两片 B200 组成的 GB200 在基于 1750 亿组参数 GPT-3 模型的基准测试中,其性能是 H100 的 7 倍、训练速度则提高了 4 倍。GPT-MOE-1.8T参数模型,可以在2000台Blackwell GPUs上90天完成训练,相比使用H100仅需要四分之一的能源。
3.生成性人工智能的应用: 黄仁勋还展示了一些令人惊叹的生成性人工智能的应用,如用于天气预测的CorrDiff模型,用于药物发现的BioNeMo,用于文本摘要和对话的NeMo服务,以及用于人形机器人学习的Groot模型。
他还宣布了与AWS、Google、Oracle、Microsoft等云服务提供商的合作,以及与SAP、ServiceNow、Cohesity、Snowflake、NetApp等IT平台的合作,利用Nvidia AI Foundry为客户提供定制的AI解决方案。
同时,黄仁勋在演讲中宣布全球最大电动车公司比亚迪,将采用英伟达的下一代人工智能汽车芯片Thor,主要用于自动驾驶模型训练。
最后,黄仁勋以一场由Nvidia Omniverse和AI创造的虚拟音乐会作为演讲的高潮,展示了计算机图形学、物理学和人工智能的结合。他还邀请了迪士尼的智能机器人作为特别嘉宾,展示了人形机器人的最新进展。
以下为2小时演讲核心内容概述:
CEO黄仁勋在GTC大会上的演讲聚焦于NVIDIA在气候科学、无线电科学、AI、机器人、自动驾驶等领域的贡献与未来展望。黄仁勋强调了加速计算对于AI在非IT行业的应用,及其对于解决行业问题的重要性。
他提到了CUDA的发展历程,DGX1计算机的创新,以及AI与大数据中心的未来。特别提出了Blackwell GPU的先进性,以及它在提高计算效率和支持未来AI模型发展中的关键角色。此外,黄仁勋展望了数字孪生、AI在医疗、天气预测等领域的应用前景,并强调了AIGC(AI生成内容)对于软件开发的影响。欢迎各位参加GTC大会。
今天,我们聚焦于气候科学、无线电科学、AI、机器人、自动驾驶汽车等领域的前沿进展。我想强调的是,使用加速计算技术,非IT行业正在AI领域取得突破,解决行业特定问题,预示着一个高达100万亿元的产业新纪元。
NVIDIA的里程碑
自1993年成立以来,NVIDIA在计算领域的旅程充满了创新。2006年,我们推出了CUDA,一个革命性的并行计算平台和编程模型,它为加速计算铺平了道路。
2012年,CUDA首次与AI相结合,至2016年,我们推出了DGX1,一款将八个GPU结合在一起的强大计算机,大幅提高了AI模型的训练效率。2022年,ChatGPT的推出让全世界认识到AI的潜力。AI不仅是新产业的催化剂,而且正在重塑软件开发过程,推动数据中心的演进。
AI工厂的概念
就像第一次工业革命时人们意识到需要能量一样,现在我们认识到构建未来的“AI工厂”需要新型基础设施。在这个视角下,我们不仅需要提高计算的规模,而且需要找到可持续的方法来降低计算成本。
技术展望
我们正处于加速计算的临界点。通过与Ansys、Synopsys、Cadence等合作伙伴的共同努力,我们在加速光刻技术、流体动力学模拟等领域取得了重大进展。此外,我们正致力于扩大LLM(大型语言模型)的规模,以支持未来AI的需求。目前,最先进的模型拥有1.8万亿个参数,未来的需求将会更加庞大。
Blackwell GPU的介绍
我们引入了世界上最先进的GPU—Blackwell,拥有2080亿晶体管,它不仅在性能上超越了前代产品,还在AI领域的应用中提供了突破性的进展。Blackwell GPU的设计,特别是其与CPU的深度集成,第五代NVLink,以及加强的安全特性,标志着我们对未来计算需求的深刻理解和前瞻性布局。
应用领域与合作伙伴
NVIDIA的技术已被广泛应用于天气预测、医疗影像、基因测序等多个领域。通过与亚马逊、谷歌、甲骨文、微软等企业的合作,我们正共同优化和准备未来AI的发展。特别地,我们通过创新方法,如本地化部署大模型软件,使企业能够更有效地利用AI技术。
未来展望
我们正站在新一代AI技术的门槛上。下一代AI将不仅仅通过阅读语言模仿人类,而是通过观察、学习和理解物理世界来提高其能力。我们与西门子的合作关系,以及我们在自动驾驶和机器人技术方面的进步,都是朝着这一目标迈进的实证。NVIDIA正引领一场新的工业革命,通过加速计算和AI生成内容(AIGC),开启了数据中心市场的新纪元。我们所追求的,不仅仅是技术的进步,而是通过技术创新,推动社会进步和行业变革。
(2小时完整版视频)
以下为2小时演讲原文完整版:
Jensen Wong:
Welcome to GTC. I hope you realize this is not a concert. You have arrived at a developers conference. There will be a lot of science described algorithms, computer architecture, mathematics.
I sensed a very heavy wait in the room all of a sudden, almost like you were in the wrong place. No, no conference in the world. Is there a greatest assembly of researchers from such diverse fields of science, from climate tech to radio sciences, trying to figure out how to use AI to robotically control mimos for next generation 6 G radios, robotic self driving cars, even artificial intelligence, even artificial intelligence, their brace? First, I noticed a sense of relief there all of a sudden. Also, this conference is represented by some amazing companies. This list, this is not the attendees, these are the presenters. And what's amazing is this, if you take away all of my friends, close friends, Michael Dell is sitting right there in the It industry.
All of the friends I grew up with in the industry, if you take away that list, this is what's amazing. These are the presenters of the non It industries using accelerated computing to solve problems that normal computers can't. It's rip represented in life sciences, healthcare, genomics, transportation, of course, retail, logistics, manufacturing, industrial. The gamut of industries represented is truly amazing. And you're not here to attend, only you're here to present. Talk about your research, $100 trillion of the world's industries is represented in this room today. This is absolutely amazing.
There is absolutely something happening. There is something going on. The industry is being transformed, not just hours because the computer industry, the computer is the single most important instrument of society today. Fundamental transformations in computing affects every industry, but how did we start?
How did we get here? I made a little cartoon for you. Literally, I drew this in one page. This is Nvidia's journey, started in 1993. This might be the rest of the talk, 1993, this is our journey. We were founded in 1993. There are several important events that happen along the way. I'll just highlight a few in 2006 kuda, which has turned out to have been a revolutionary computing model. We thought it was revolutionary then. It was going to be an overnight success. And almost 20 years later, it happened. We saw her coming. Two decades later.
In 2012, Alexnet AI and kuda made first contact in 2016, recognizing the importance of this computing model, we invented a brand new type of computer we call the dgx 1 1 170 teraflops. In this supercomputer, 8 Gpu's connected together for the very first time. I hand delivered the very first dgx 1 to a startup located in San Francisco called OpenAI.
Dgx 1 was the world's first AI supercomputer. Remember 170 teraflops 2017, the Transformer arrived 20 2022 ChatGPT captured the world's management imaginations have people realize the importance and the capabilities of artificial intelligence. And 2023 generative AI emerged and a new industry begins.
Why? Why is a new industry? Because the software never existed before. We are now producing software using computers to write software, producing software that never existed before. It is a brand new category, it took share from nothing, it's a brand new category, and the way you produce the software is unlike anything we've ever done before in data centers, generating tokens, producing floating point numbers at very large scale, as if in the beginning of this last industrial revolution when people realized that you would set up factories, apply energy to it, and this invisible valuable thing called electricity came out AC generators, and 100 years later, 200 years later, we are now creating new types of electrons, tokens using infrastructure, we call factories AI factories to generate this new incredibly valuable thing called artificial intelligence.
A new industry has emerged. Well, we're to talk about many things about this new industry. We're going to talk about how we're going to do computing next, we want to talk about the type of software that you build because of this new industry, the new software, how you would think about this new software, What about applications in this new industry? And then maybe what's next and how can we start preparing today for what is about to come next? Well, but before I start, I want to show you the soul of Nvidia, the soul of our company at the intersection of computer graphics, physics and artificial intelligence, all intersecting inside a computer in Omniverse, in a virtual world simulation. Everything we're going to show you today, literally everything we're going to show you today, is a simulation, not animation. It's only beautiful because it's physics. The world is beautiful, it's only amazing because it's being animated with robotics, it's being animated with artificial intelligence, what you're about to see all day, it's completely generated completely simulated and Omniverse and all of it, what you're about to enjoy is the world's first concert where everything is homemade.
Everything is homemade. You're about to watch some home videos, so sit back and enjoy yourself, God, I love Nvidia.
Accelerated computing has reached the tipping. General purpose computing has run out of steam. We need another way of doing computing so that we can continue to scale, so that we can continue to drive down the cost of computing so that we can continue to consume more and more computing while being sustainable. Accelerated computing is a dramatic speed up over general purpose computing. And in every single industry we engage, and I'll show you many, the impact is dramatic, but in no industry is it more important than our own.
The industry of using simulation tools to create products. In this industry, it is not about driving down the cost of computing, it's about driving up the scale of computing.
We would like to be able to simulate the entire product that we do completely in full fidelity, completely digitally, and essentially what we call digital twins. We would like to design it, build it, simulate it, operate it completely digitally. In order to do that, we need to accelerate an entire industry, and today I would like to announce that we have some partners who are joining us in this journey to accelerate their entire ecosystem so that we can bring the world into accelerated computing. But there's a bonus. When you become accelerated, your infrastructure is kuda Gpu's, and when that happens, it's exactly the same infrastructure for generative AI. And so I'm just delighted to announce several very important partnerships.
There are some of the most important companies in the world. Ansys does engineering simulation for what the world makes. We're partnering with them to COO, to accelerate the Ansys ecosystem, to connect Ansys to the Omniverse digital Twin Incredible. The thing that's really great is that the installed base of media GPU accelerated systems are all over the world, in every cloud, in every system, all over enterprises. And so the applications they accelerate will have a giant installed base to go serve. End users will have amazing applications. And of course, system makers and Csp's will have great customer demand.
Synopsis synopsis is invidious, literally first software partner, they were there in the very first day of our company synopsis revolutionized the chip industry with high level design, we are going to kuda accelerate synopsis, we're accelerating computational lithography, one of the most important applications that nobody's ever known about. In order to make chips, we have to push lithography to a limit. Nvidia has created a librarian domain specific library that accelerates computational lithography incredibly once we can accelerate and software define all of TSMC who is announcing today that they're going to go into production with Nvidia culliton. Once it's software defined and accelerated, the next step is to apply generative AI to the future of semiconductor manufacturing, pushing geometry even further.
Cadence builds the world's essential Eda and SDA tools. We also use cadence between these three companies.
Ansys synopsis and Cadence, we basically build Nvidia together. We are good to accelerating Cadence. They're also building a supercomputer out of Nvidia Gpu's so that their customers could do fluid dynamic simulation at 100, a thousand times scale, basically a wind tunnel in real time. Cadence Millennium, a supercomputer with Nvidia Gpu's inside a software company building supercomputers. I love seeing that building cadence Copilots together. Imagine a day when Cadence could synopsis Ansys tool providers would offer you AI co-pilots so that we have thousands and thousands of Copilot assistants helping us design chips design systems and we're also going to connect kaden's digital twin platform to Omniverse. As you can see the trend here, we're accelerating the world's CE Eda and SDA so that we could create our future in digital twins, and we're going to connect them all to Omniverse, the fundamental operating system for future digital twins, one of the industries that benefited tremendously from scale. And you know, you all know this one very well.
Large language models. Basically, after the transformer was invented, we were able to scale large language models at incredible rates, effectively doubling every six months. Now, how is it possible that by doubling every six months that we have grown the industry, we have grown the computational requirements so far? And the reason for that is quite simply this, If you double the size of the model, you double the size of your brain, you need twice as much information to go fill it. And so every time you double your parameter count, you also have to appropriately increase your training token count. The combination of those two numbers becomes the computation scale.
You have to support the latest, the state of the art OpenAI model is approximately 1.8 trillion parameters, 1.8 trillion parameters required several trillion tokens to go train. So a few trillion parameters on the order of a few trillion tokens on the order of when you multiply the two of them together, approximately 30, 40, 50 billion quadrillion floating point operations per second. Now we just have to do some Co math right now. Just hang with me. So you have 30 billion quadrillion, 1 quadrillion is like a pedda, And so if you had a Peta flop GPU, you would need 30 billion seconds to go compute, to go train that model. 30 billion seconds is approximately 1000 years, while 1000 years, it's worth it.
I'd like to do it sooner, but it's worth it. Which is usually my answer when most people tell me, hey, how long, how long is it going to take to do something? So we have 20 years. It's worth it. But can we do it next week? And so 1000 years, 1000 years.
So what we need, what we need, our bigger Gpus, we need much, much bigger Gpu's. We recognize this early on, and we realized that the answer is to put a whole bunch of Gpu's together and of course, innovate a whole bunch of things along the way, like inventing tensor cores, advancing Mv links so that we could create essentially virtually giant Gpu's and connecting them all together with amazing networks from a company called mellanox Infiniband so that we could create these giant systems. And so dgx 1 was our first version, but it wasn't the last we built. We build supercomputers all the way all along the way in 2021, we had Celine 4500 Gpu's or so. And then in 2023, we built one of the largest AI supercomputers in the world.
It's just come online eels. And as we're building these things, we're trying to help the world build these things and in order to help the world build these things, we got to build them first. We build the chips, the systems, the networking, all of the software necessary to do this. You should see these systems.
Imagine writing a piece of software that runs across the entire system, distributing the computation across thousands of Gpu's, but inside are thousands of smaller Gpu's, millions of Gpu's to distribute work, across all of that and to balance the workload so that you can get the most energy efficiency, the best computation time, keep your costs down. And so those, those fundamental innovations is what got us here.
And here we are as we see the miracle of ChatGPT emerge in front of us, we also realize we have a long ways to go, we need even larger models, we're going to train it with multi modality data, not just text on the internet, but we're going to train it on texts and images and graphs and charts. And just as we learn watching TV. And so there's going to be a whole bunch of watching video so that these models can be grounded in physics understands that an arm doesn't go through a wall. And so these models would have common sense by watching a lot of the world's video combined with a lot of the world's languages. It'll use things like synthetic data generation, just as you and I do when we try to learn, we might use our imagination to simulate how it's going to end up, just as I, when I was preparing for this keynote, I was simulating it all along the way. I hope it's going to turn out as well as I had into my head.
As I was simulating how this keynote was going to turn out, somebody did say that another performer did her performance completely on a treadmill so that she could be in shape to deliver it with full energy. I didn't do that. If I get a low wind and about 10 minutes into this, you know what happened. And so, so where were we, we're seen here using synthetic data generation.
We're going to use reinforcement learning. We're going to practice it in our mind, we're going to have AI working with AI training each other, just like student, teacher, debaters. All of that is going to increase the size of our model. It's going to increase the amount of data that we have, and we're going to have to build even bigger Gpu's. Hopper is fantastic, but we need bigger Gpus. And so ladies and gentlemen, I would like to introduce you to a very, very, very big GPU.
Named after David Blackwell, a mathematician, game theorists probability we thought it was a perfect name. Blackwell, ladies and gentlemen, enjoy this.
Yeah.
Blackwell is not a chip. Blackwell is the name of a platform. People think we make Gpus and we do, but Gpu's don't look the way they used to. Here's the, if you will, the heart of the Blackwell system. And this inside the company is not called Blackwell is just the number and I this, this is Blackwell sitting next to Oh, this is the most advanced GPU in the world in production today. This is Hopper, this is hopper. Hopper changed the world. This is Blackwell.
It's okay hopper.
You're very good. Good, good boy. What the girl? 208 billion transistors.
And so you could see, I can see that there's a small line between 2 dyes. This is the first time 2 dyes have a button like this together in such a way that the two dies think it's one chip. There's 10 TB of data between it, 10 TB per second. So that these two, these two sides of the Blackwell chip have no clue which side they're on. There's no memory locality issues, no cash issues. It's just one giant, giant chip. And so when we were told that Blackwell's ambitions were beyond the limits of physics, the engineer said, so what? And so this is what happened, and so this is the Blackwell chip.
And it goes into two types of systems. The first one, it's form fit function compatible to Hopper. And so you slide on Hopper and you push in Blackwall. That's the reason why one of the challenges of ramping is going to be so efficient. There are installations of hoppers all over the world and they could be, they could be, you know, the same infrastructure, same design, the power, the electricity, the thermals, the software, identical, push it right back. And so this is a hopper version for the current hgx configuration. And this is what the other, the second hopper looks like this. Now this is a prototype board and Janine, could I just borrow ladies and John and Janine Paul?
And so this is a fully functioning board. And I'll just be careful here. This right here is, I don't know, $10 billion. The second one's five. It gets cheaper after that. So any customers in the audience, it's okay. No, all right. But this is, this one's quite expensive.
This is the bring up board and the way it's going to go to production is like this one here, okay? And so you're going to take take this, it has 2 Blackwell die, 2 Blackwell chips and 4 Blackwell dyes connected to a Grace CPU. The Grace CPU has a super fast chip to chip link. What's amazing is this computer, first of its kind, where this much computation, first of all, fits into this small of a place. Second, it's memory coherent. They feel like they're just one big happy family working on one application location together, and so everything is coherent within it, just the amount of, you know, you saw the numbers, there's a lot of terabytes this and terabytes that's, but this is, this is a miracle.
This is a this. Let's see, what are some of the things on here? There's an Mv link on top PCI express on the bottom on on your which one is my and your left one of them it doesn't matter one of the one of them is a CPU chip to chip link is my left or you're depending on which side I was just I was trying to sort that out and I just kind of doesn't matter i'. Hopefully it comes plugged in so. Okay, so this is the Grace Blackwell system.
But there's more. So it turns out, it turns out all of the specs is fantastic, but we need a whole lot of new features in order to push the limits beyond, if you will, the limits of physics. We would like to always get a lot more X factors. And so one of the things that we did was we invented another transformer engine. Another transformer engine, the second generation, it has the ability to dynamically and automatically rescale and recast numerical formats to a lower precision.
Whenever you can remember, artificial intelligence is about probability. And so you kind of have, you know, 1.7, approximately 1.7 times approximately 1.4 to be approximately something else. Does that make sense? And so the ability for the mathematics to retain the precision and the range necessary in that particular stage of the pipeline, super important.
And so this is, it's not just about the fact that we designed a smaller Alu. It's not quite, the world's not quite that simple. You've got to figure out when you can use that across a computation that is thousands of Gpu's. It's running for weeks and weeks on weeks, and you want to make sure that the training job is going to converge.
And so this new transformer engine, we have a fifth generation NV link. It's now twice as fast as Hopper, but very importantly, it has computation in the network. And the reason for that is because when you have so many different Gpu's working together, we have to share our information with each other. We have to synchronize and update each other. And every so often we have to reduce the partial products and then rebroadcast out the partial products that some of the partial products back to everybody else. And so there's a lot of what is called all reduce and all to all and all gather.
It's all part of this area of synchronization and collectives so that we can have Gpu's working with each other, having extraordinarily fast links and being able to do mathematics right in the network allows us to essentially amplify even further.
So even though it's 1.8 TB per second, it's effectively higher than that. And so it's many times that of Hopper, the likelihood of a supercomputer running for weeks on end is approximately 0. And the reason for that is because there's so many components working at the same time. The statistic, the probability of them working continuously is very low. And so we need to make sure that whenever there is a well, we checkpoint and restart as often as we can. But if we have the ability to detect a weak chip or a weak note early, we can retire it and maybe swap in another processor.
That ability to keep the utilization of the supercomputer high, especially when you just spent $2 billion building it, is super important. And so we put in a Ras engine, a reliability engine that does 100% self test in system test of every single gate, every single bit of memory on the Blackwell chip and all the memory that's connected to it. It's almost as if we shipped with every single chip, its own advanced tester that we test our chips with. This is the first time we're doing this super excited about it secure AI.
Only this conference today, clap for Ras the secure AI. Obviously you've just spent hundreds of millions of dollars creating a very important AI and the code, the intelligence of that AI is encoded in the parameters. You want to make sure that on the one hand, you don't lose it, on the other hand, it doesn't get contaminated. And so we now have the ability to encrypt data, of course, at rest, but also in transit. And while it's being computed, it's all encrypted. And so we now have the ability to encrypt and transmission. And when we're computing it, it is in a trusted, trusted environment, trusted engine environment.
And the last thing is decompression, moving data in and out of these nodes when the compute is so fast becomes really essential. And so we've put in a high line speed compression engine, and it effectively moves data 20 times faster in and out of these computers. These computers are so powerful and they're such a large investment. The last thing we want to do is have them be idle, and so all of these capabilities are intended to keep Blackwell fed and as busy as possible.
Overall, compared to Hopper, it is 2.5 times 2.5 times the FPA 8 performance for training per chip. It also has this new format called FP 6, so that even though the computation speed is the same, the bandwidth that's amplified because of the memory, the amount of parameters you can store in the memory is now amplified. FP 4 effectively doubles the throughput. This is vitally important for inference.
One of the things that is becoming very clear is that whenever you use a computer with AI on the other side, when you're chatting with the chat bot, when you're asking it to review or make an image, remember in the back is a GPU generating tokens. Some people call it inference, but it's more appropriately generation the way that computing has done in the past was retrieval. You would grab your phone, you would touch something, some signals go off, basically an email goes off to some storage somewhere there's prerecorded content, somebody wrote a story, or somebody made an image, or somebody recorded a video that record prerecorded content is then streamed back to the phone and recomposed in a way based on a recommender system to present the information to you. You know that in the future, the vast majority of that content will not be retrieved, and the reason for that is because that was prerecorded by somebody who doesn't understand the context, which is the reason why we have to retrieve so much content. If you can be working with an AI that understands the context, who you are, for what reason you're fetching this information, and produces the information for you just the way you like it, the amount of energy we save, the amount of networking, bandwidth we save, the amount of waste of time we save will be tremendous. The future is generative, which is the reason why we call it generative AI, which is the reason why this is a brand new industry.
The way we compute is fundamentally different. We created a processor for the generative AI era, and one of the most important parts of it is content token generation. We call it this format is FP 4.
Well, that's a lot of computation, 5x the token generation, 5x the inference capability of Hopper seems like enough. But why stop there? The answer is, it's not enough. And I'm going to show you why. I'm going to show you what. And so we would like to have a bigger GPU, even bigger than this one.
And so we decided to scale it and notice, but first, let me just tell you how we've scaled over the course of the last eight years. We've increased computation by 1000 times 8 years, 1000 times. Remember back in the good old days of Moore's Law, it was 2x, well, 5x every, well, 10x every five years, that's the easiest, easiest math, 10x every five years, 100 times every 10 years, 100 times every 10 years in the middle, in the heydays of the PC revolution, 100 times every 10 years. In the last eight years, we've gone 1000 times. We have two more years to go.
And so that puts it in perspective.
The rate at which we're advancing computing is insane, and it's still not fast enough. So we built another chip. This chip, it's just an incredible chip, we call it the nvlink switch, it's 50 billion transistors, it's almost the size of hopper all by itself. This switch ship has four envy links in it, each 1.8 TB per second. And it has computation. And as I mentioned, what is this chip for? If we were to build such a chip, we can have every single GPU talk to every other GPU at full speed at the same time. That's insane.
It doesn't even make sense. But if you could do that, if you can find a way to do that and build a system to do that, that's cost effective, that's cost effective, how incredible would it be that we could have all these Gpu's connect over a coherent link so that they effectively are one giant GPU? Well, one of the great inventions in order to make it cost effective is that this chip has to drive copy directly. The certes of this chip is just a phenomenal invention, so that we could do direct drive to copper and as a result, you can build a system that looks like this.
Now this system, this system is kind of insane. This is one dgx, this is what a dgx looks like. Now remember, just six years ago, it was pretty heavy, heavy, but I was able to lift it. I delivered the first dgx 1 to OpenAI and the researchers there. It's on, you know, the pictures that are on the internet, and we all autographed it. And if you come to my office, it's autographed there. It's really beautiful, lifted at this dgx, this dgx that dgx, by the way, was 170 teraflops If you're not familiar with the numbering system, that's 0.17 petaflops.
So this is 720. The first 1 I delivered to OpenAI was 0.17. You could round it up to 0.2, won't make any difference, but and by then it was like, wow, you know, 30 more tariffs. And so this is now 720 petaflops, almost an exaflop for training. And the world's first one exaflops machine in one rack.
Just so you know, there are only a couple, 2, 3 exaflops machines on the planet as we speak. And so this is an exa flops AI system in one single rack. Well, let's take a look at the back of it. So this is what makes it possible. That's the back, that's the that's the back, the dgx Mv link spine 130 TB per second goes through the back of that chassis. That is more than the aggregate bandwidth of the internet.
So we could basically send everything to everybody within a second. And so we have 5000 cables, 5000 Mv link cables in total, two miles. Now, this is the amazing thing. If we had to use optics, we would have had to use transceivers and retainers, and those transceivers and retainers alone would have cost 20000 W, 2 kW of just transceivers alone, just to drive the nvlink spine. As a result, we did it completely for free over nvlink switch, and we were able to save the 20 kW for computational. This entire rack is 120 kW, so that 20 kW makes a huge difference.
It's liquid cooled, what goes in is 25 degrees C about room temperature. What comes out is 45 degrees C about your Jacuzzi. So room temperature goes in, Jacuzzi comes out 2 l per second.
We could sell a peripheral.
600000 parts. Somebody used to say, you know, you guys make Gpu's and we do, but this is what a GPU looks like to me. When somebody says GPU, I see this two years ago when I saw a GPU was the hgx, it was £70, 35 parts. Our Gpu's now are $600000 parts and £3000, £3000, £3000. That's kind of like the weight of a, you know, carbon fiber Ferrari. I don't know if that's useful metric.
Everybody's going. I feel it, I feel it, I get it. I get that. Now that you mentioned that, I feel it, I don't know what's £3000? Okay, so £3000 a ton and a half, so it's not quite an elephant. So this is what a dgx looks like. Now let's see what it looks like in operation.
Okay, let's imagine what is what, How do we put this to work and what does that mean? Well, if you were to train a GP team model, 1.8 trillion parameter model, it took about apparently about, you know, 3 to 5 months or so with 25000 A If we were to do it with Hopper, it would probably take something like 8000 Gpu's and it would consume 15 MW, 8000 Gpus.
On 15 MW, it would take 90 days, about three months. And that allows you to train something that is, you know, this groundbreaking AI model. And this is obviously not as expensive as anybody would think, but it's 8000 8000 Gpu's. It's still a lot of money. And so 8000 Gpu's, 15 MW. If you were to use Blackwell to do this, it would only take 2000 Gpus, 2000 Gpu's, same 90 days. But this is the amazing part, only 4 MW of power. So from 15, that's right.
And that's, and that's our goal. Our goal is to continuously drive down the cost and the energy. They're directly proportional to each other, cost and energy associated with the computing so that we can continue to expand and scale up the computation that we have to do to train the next generation models. Well, this is training, inference or generation is vitally important going forward.
You know, probably some half of the time that Nvidia Gpus are in the cloud these days, it's being used for token generation. You know, they're either doing Copilot this or, you know, ChatGPT that, or all these different models that are being used when you're interacting with it or generating images or generating videos, generating proteins, generating chemicals. There's a bunch of generation going on. All of that is in the category of computing we call inference. But inference is extremely hard for large language models because these large language models have several properties. One, they're very large, and so it doesn't fit on 1 GPU, imagine Excel doesn't fit on 1 GPU, you know, and imagine some application you're running on a daily base doesn't fit on one computer, like a video game doesn't fit on one computer, and most in fact do. And many times in the past in hyperscale computing, many applications for many people fit on the same computer, and now all of a sudden, there's one inference application where you're interacting with this chat chatbot, That chat bot requires a supercomputer in the back to run it. And that's the future.
The future is generative with these chat bots, and these chat bots are trillions of tokens, trillions of parameters, and they have to generate tokens at interactive rates.
Now, what does that mean? Oh, well, three tokens is about a word, you know, the, you know, space, the final frontier. These are the adventures that's like, that's like 80 tokens. Okay, I don't know if that's useful to you and so. She, the art of communications is selecting good analogies. Yeah, this is, this is not going well. Every side. I don't know what he's talking about. I've never seen Star Trek. And so and so here we are, we're trying to generate these tokens. When you're interacting with it, you're hoping that the tokens come back to you as quickly as possible and as quickly as you could read it. And so the ability for generation tokens is really important.
You have to paralyze the work of this model across many, many Gpu's so that you could achieve several things. One, on the one hand, you would like throughput because that throughput reduces the cost, the overall cost per token of generating. So your throughput dictates the cost of delivering the service. On the other hand, you have another interactive rate, which is another tokens per second, where it's about per user. And that has everything to do with quality of service. And so these two things compete against each other. And we have to find a way to distribute work across all of these different Gpu's and paralyze it in a way that allows us to achieve both.
And it turns out the search space is enormous. You know, I told you there's going to be math involved and everybody's going all dear. I heard some gasp just now when I put up that slide, you know, so this right here, the Y axes is tokens per second data center throughput, the X axis is tokens per second, interactivity of the person. And notice the upper right is the best. You want interactivity to be very high number of tokens per second per user. You want the tokens per second per data center to be very high. The upper right is terrific. However, it's very hard to do that.
And in order for us to search for the best answer across every single one of those intersections, xy coordinates, okay, so you just look at every single xy coordinate. All those blue dots came from some repartitioning of the software.
Some optimizing solution has to go and figure out whether to use tensor parallel, expert parallel, pipeline parallel, or data parallel, and distribute this enormous model across all these different Gpu's and sustain the performance that you need. This exploration space would be impossible if not for the programmability of Nvidia's Gpu's. And so we could, because of, because we have such a rich ecosystem, we could explore this universe and fine, that green roof line. It turns out that green roof line, notice you've got a TP two EPA DP 4, it means two tensor parallel, tensor parallel across 2. Gpu's expert parallels cross 8 data parallel cross 4 notice on the other end, you've got tensor parallel cross 4 and expert parallel cross 16. The configuration, the distribution of that software, it's a different, different runtime that would produce these different results. And you have to go discover that roof line.
Well, that's just one model. And this is just one configuration of a computer.
Imagine all of the models being created around the world and all the different configurations of systems that are going to be available. So now that you understand the basics, let's take a look at inference of Blackwell compared to Hopper. And this is, this is the extraordinary thing in one generation, because we created a system that's designed for trillion parameter generative AI. The inference capability of Blackwell is off the charts. And in fact, it is some 30 times Hopper.
For large language models, For large language models like ChatGPT and others like it, the blue line is hopper I gave you.
Imagine we didn't change the architecture of Hopper and we just made it a bigger chip. We just use the latest, you know, greatest 10 terrible, you know, terabytes per second. We connected the two chips together. We got this giant 208 billion parameter chip. How would we have performed if nothing else changed? And it turns out quite wonderfully, quite wonderfully, and that's the purple line, but not as great as it could be.
And that's where the FP 4 Tensor Core, the new Transformer engine, and very importantly, the Envy link switch. And the reason for that is because all these Gpu's have to share the results partial products, whenever they do, all to all gather, whenever they communicate with each other. That NV link switch is communicating almost 10 times faster than what we could do in the past using the fastest networks.
Okay, so Blackwell is going to be just an amazing system for a generative AI. And in the future, in the future, data centers are going to be thought of, as I mentioned earlier, as an AI factory. An AI factory's goal in life is to generate revenues, generate in this case, intelligence in this facility, not generating electricity as an AC generators, but of the last industrial revolution and this industrial revolution, the generation of intelligence. And so this ability is super, super, super important.
The excitement of Blackwell is really off the charts. You know, when we first, when we first, you know, this is a year and a half ago, two years ago, I guess two years ago when we first started to go to market with Hopper, you know, we had the benefit of 2. Csp's joined us in a lunch and we were delighted. And so we had two customers. We have more now.
Unbelievable excitement for Blackwell. Unbelievable excitement. And there's a whole bunch of different configurations. Of course, I showed you the configurations that slide into the hopper form factor, so that's easy to upgrade. I showed you examples that are liquid cooled, that are the extreme versions of it, One entire rack that's connected by Mv link 6 72. We're going to Blackwell is going to be ramping to the world's AI companies, of which there are so many now doing amazing work in different modalities. The Csp's, every CSP is geared up all the Oems and odms regional clouds, sovereign AIS and telcos all over the world are signing up to launch with Blackwell this.
Blackwell would be the most successful product launch in our history. And so I can't wait, wait to see that. I want to thank, I want to thank some partners that are joining us in this.
AWS is gearing up for Blackwell. They're they're going to build the 1st GPU with secure AI. They're building out a 222 exaflops system. You know, just now when we animate just now the digital twin, if you saw all of those clusters are coming down, by the way, that is not just art. That is a digital twin of what we're building. That's how big it's going to be. Besides infrastructure, we're doing a lot of things together with AWS. We're kuda accelerating SageMaker AI, we're kuda accelerating Bedrock AI. Amazon Robotics is working with us using Nvidia Omniverse and Isaac Sim. AWS Health has Nvidia health integrated into it. AWS has really leaned into accelerated computing.
Google is gearing up for Blackwell, GCP already has a 100 S H-100 S T 4 SL 4 S, a whole fleet of Nvidia Kuta Gpu's, and they recently announced a Gemma model that runs across all of it. We're working to optimize and accelerate every aspect of GCP. We're accelerating data proc, which data processing their data processing engine Jax Xla tech AI and mujo for robotics. So we're working with Google and GCP across a whole bunch of initiatives.
Oracle is gearing up for Blackwell. Oracle is a great partner of ours for Nvidia dgx Cloud. And we're also working together to accelerate something that's really important to a lot of companies. Oracle Database, Microsoft is accelerating and Microsoft is gearing up for Blackwell. Microsoft Nvidia has a wide ranging partnership where accelerating could accelerating all kinds of services when you when you chat obviously and AI services that are in Microsoft Azure, it's very, very likely Nvidia's in the back doing the inference and the token generation we built, they built the largest Nvidia Infiniband supercomputer, basically a digital twin of ours or a physical twin of ours. We're bringing the Nvidia ecosystem to Azure Nvidia digs cloud to Azure. Nvidia Omniverse is now hosted in Azure. Nvidia healthcare is an Azure and all of it is deeply integrated and deeply connected with Microsoft Fabric.
The whole industry is gearing up for Blackwell. This is what I'm about to show you. Most of the scenes that you've seen so far of Blackwell are the full fidelity design of Blackwell.
Everything in our company has a digital twin, and in fact, this digital twin idea is really spreading and it helps companies build very complicated things perfectly the first time. And what could be more exciting than creating a digital twin to build a computer that was built in a digital twin? And so let me show you what wistron is doing. To meet the demand for Nvidia accelerated computing, wiston one of our leading manufacturing partners is building digital twins of Nvidia dgx and hgx factories using custom software developed with Omniverse Sdks and Apis for their newest factory westron started with the digital twin to virtually integrate their multi-cad and process simulation data into a unified view, testing and optimizing layouts in this physically accurate digital environment increased worker efficiency by 51%. During construction, the Omniverse Digital twin was used to verify that the physical build matched the digital plans, identifying any discrepancies early has helped avoid costly change orders, and the results have been impressive using a digital twin helped bring wistrand factory online in half the time, just 2.5 months instead of 5 in operation, the Omniverse Digital Twin helps westron rapidly test new layouts to accommodate new processes or improve operations in the existing space and monitor real time operations using live IoT data from every machine on the production line line, which ultimately enabled wistron to reduce end to end cycle times by 50% and defect rates by 40% with Nvidia AI and Omniverse Nvidia's global ecosystem of partners are building a new era of accelerated AI enabled digitalization.
That's how we are. That's the way it's going to be in the future when I'm manufacturing everything digitally first, and then we'll manufacture it physically. People ask me, how did it start? What got you guys so excited? What was it that you saw that caused you to put it all in on this incredible idea? And it's this. Hang on a second.
Guys, that was going to be such a moment. That's what happens when you don't rehearse.
This, as you know, was first contact 2012 Alexnet. You put a cat into this computer and it comes out and it says cat. And we said, oh, my God, this is going to change everything.
You take 1 million numbers, you take 1 million numbers across three channels, RGB, these numbers make no sense to anybody. You put it into this software and it compress it dimensionally, reduce it, it reduces it from a million dimensions, 1 million dimensions, it turns it into three letters, one vector, 1 number. And it's generalized.
You could have the cat be different cats and you could have it be the front of the cat and the back of the cat. And you look at this thing. Is it unbelievable? You mean any cats? Yeah, any cat. And it was able to recognize all these cats. And we realized how it did it systematically, Structurally, it's scalable. How big can you make it? Well, how big do you want to make it? And so we imagine that this is a completely new way of writing software. And now today, as you know, you can have you type in the word cat. And what comes out is a cat. It went the other way, am I right?
Unbelievable, how is it possible? That's right, how is it possible? You took three letters and you generated a million pixels from it and it made sense?
Well, that's the miracle and here we are, just literally 10 years later, 10 years later, where we recognize texts, we recognize images, we recognize videos and sounds and images. Not only do we recognize them, we understand their meaning, we understand the meaning of the text. That's the reason why I can chat with you, it can summarize for you, it understands the text, it understood, not just recognizes the English, it understood the English, it doesn't just recognize the pixels, it understood the pixels. And you can, you can even condition it between two modalities. You can have language conditioned image and generate all kinds of interesting things. Well, if you can understand these things, what else can you understand that you've digitized the reason why we started with texts and you know images is because we digitized those, but what else have we digitized? Well, it turns out we digitized a lot of things, proteins and genes and brainwaves, anything you can digitize, so long as there's structure, we can probably learn some patterns from it, and if we can learn the patterns from it, we can understand its meaning, if we can understand its meaning, we might be able to generate it as well. And so therefore, the generative AI revolution is here, well, what else can we generate, what else can we learn?
Well, one of the things that we would love to learn, we would love to learn is we would love to learn climate, we would love to learn, extreme weather, we would love to learn what, how we can predict future, future weather, weather at regional scales at sufficiently high resolution such that we can keep people out of harm's way before harm comes.
Extreme weather. Weather cost the world $150 billion, surely more than that. It's not evenly distributed, $150 billion is concentrated in some parts of the world. And of course, to some people of the world. We need to adapt and we need to know what's coming. And so we are creating Earth Ii, a digital twin of the Earth, for predicting weather. And we've made an extraordinary invention called the ability to use generative AI to predict weather at extremely high resolution. Let's take a look.
As the Earth's climate changes AI powered weather forecasting is allowing us to more accurately predict and track severe storms like super typhoon chanthu, which caused widespread damage in Taiwan and the surrounding region in 2021. Current AI forecast models can accurately predict the track of storms, but they are limited to 25 kilometre resolution, which can miss important details.
Nvidia's Cardiff is a revolutionary new generative AI model trained on high resolution radar, assimilated wharf weather forecasts and Era 5 reanalysis data using Cordis. Extreme events like chantu can be super resolved from 25 kilometre to 2 kilometre resolution with 1000 times the speed and 3000 times the energy efficiency of conventional weather models.
By combining the speed and accuracy of Nvidia's weather forecasting model, forecast net, and generative AI models like Cordy, we can explore hundreds or even thousands of kilometre scale regional weather forecasts to provide a clear picture of the best, worst and most likely impacts of a storm. This wealth of information can help minimise loss of life and property damage. Today cordifer is optimised for Taiwan, but soon generative super sampling will be available as part of the Nvidia Earth Ii inference service for many regions across the down.
The weather company is to trust the source of global weather prediction. We are working together to accelerate their weather simulation, first principled base of simulation. However, they're also going to integrate Earth to Cork so that they could help businesses and countries do regional high resolution weather prediction. And so if you have some weather prediction you'd like to know you'd like to do, reach out to the weather company.
Really exciting, really exciting work. Nvidia Healthcare, something we started 15 years ago. We're super excited about this. This is an area where we're very, very proud, whether it's medical imaging or gene sequencing or computational chemistry, it is very likely that Nvidia is the computation behind it. We've done so much work in this area.
Today, we're announcing that we're going to do something really, really cool. Imagine all of these AI models that are being used to generate images and audio, but instead of images and audio, because it understood images and audio, all the digitization that we've done for genes and proteins and amino acids, that digitization capability is now passed through machine learning so that we understand the language of life, the ability to understand the language of life. Of course, we saw the first evidence of it with alpha fold. This is really quite an extraordinary thing after decades of painstaking work, the world had only digitized and reconstructed using chiro electron microscopy or X-ray crystallography. These different techniques painstakingly reconstructed the protein, 200000 of them, in just, what is it, less than a year or so, alpha fold has reconstructed 200 million proteins, Basically every protein of every living thing that's ever been sequenced. This is completely revolutionary.
Well, those models are incredibly hard to use for, incredibly hard for people to build. And so what we're going to do is we're going to build them. We're going to build them for, the researchers around the world. And it won't be the only one. There will be many other models that we create. And so let me show you what we're going to do with it.
Virtual screening for new medicines is a computationally intractable problem. Existing techniques can only scan billions of compounds and require days on thousands of standard compute nodes to identify new drug candidates. Nvidia Biome Nims enable a new generative screening paradigm using Nims for protein structure prediction with alpha fold molecule generation with mole Mim, and docking with diff DOC, we can now generate and screen candidate molecules in a matter of minutes, malim can connect to custom applications to steer the generative process iteratively optimising for desired properties. These applications can be defined with bione mo microservices or built from scratch. Here, a physics based simulation optimises for a molecule's ability to bind to a target protein while optimising for other favourable molecular properties. In parallel, mulm generates high quality drug like molecules that bind to the target and are synthesizable, translating to a higher probability of developing successful medicines. Faster bione Mo is enabling a new paradigm in drug discovery, with Nims providing on demand microservices that can be combined to build powerful drug discovery workflows like de novo protein design or guided molecule generation. For virtual screening, bione Mo Nims are helping researchers and developers reinvent computational drug design.
I nvidia momen molem core diff, there's a whole bunch of other models, whole bunch of other models, computer vision models, robotics models, and even, of course, some really, really, really terrific open source language models. These models are groundbreaking. However, it's hard for companies to use. How would you use it? How would you bring it into your company and integrate it into your workflow? How would you package it up and run it? Remember earlier I just said that inference is an extraordinary computation problem. How would you do the optimization for each and every one of these models and put together the computing stack necessary to run that supercomputer so that you can run these models in your company?
And so we have a great idea.
We're going to invent a new way and invent a new way for you to receive and operate software. This software comes basically in a digital box. We call it a container, and we call it the Nvidia inference microservice, a Nim, and let me explain to you what it is, and now it's a pre trained model, so it's pretty clever, and it is packaged and optimized to run across Nvidia's installed base, which is very, very large. What's inside it is incredible. You have all these pretrained state of the art open source models, they could be open source, they could be from one of our partners, it could be created by us, like Nvidia moment, it is packaged up with all of its dependencies, so kuda the right version, q DN, the right version, tensor RT LM distributing across the multiple Gpu's, try an inference server, all completely packaged together. It's optimized depending on whether you have a single GPU, multi GPU or multi node of Gpu's, it's optimized for that and it's connected up with Apis that are simple to use.
Now this, think about what an AI API is. An AI API is an interface that you just talked to. And so this is a piece of software in the future that has a really simple API, and that API is called human, and these packages incredible bodies of software will be optimized and packaged, and we'll put it on a website and you can download it, you could take it with you could run it in any cloud, you could run it in your own data center, you can run in workstations of it fit, and all you have to do is come to AI dot Nvidia dot com, we call it Nvidia inference microservice, but inside the company, we all call it nims chan.
Just imagine, you know, one of some day there's going to be one of these chat bytes. And these chat bytes is going to just be in a Nim. And you'll assemble a whole bunch of chat bots. And that's the way software is going to be built someday.
How do we build software in the future? It is unlikely that you'll write it from scratch or write a whole bunch of Python code or anything like that. It is very likely that you assemble a team of Ai's. There's probably going to be a super AI that you use that takes the mission that you give it and breaks it down into an execution plan. Some of that execution plan could be handed off to another Nim, that Nim would maybe understand.
Sap, the language of Sap is Abap. It might understand ServiceNow and go retrieve some information from their platforms. It might then hand that result to another Nim who goes off and does some calculation on it. Maybe it's an optimization software, a combinatorial optimization algorithm, maybe it's just some basic calculator. Maybe it's pandas to do some numerical analysis on it. And then it comes back with its answer, and it gets combined with everybody else's. And because it's been presented with, this is what the right answer should look like, it knows what right answer is to produce, and it presents it to you.
We can get a report every single day at, you know, top of the hour that has something to do with a build plan or some forecast or some customer alert or some bugs database or whatever it happens to be. And we could assemble it using all these Nims. And because these Nims have been packaged up in ready to work on your systems so long as you have video Gpus in your data center in the cloud, this, this Nims will work together as a team and do amazing things. And so we decided this is such a great idea. We're going to go do that. And so Nvidia has Nims running all over the company.
We have chat bots being created all over the place, and one of the most important chat bots, of course, is a chip designer chat bot. You might not be surprised. We care a lot about building chips, and so we want to build chatbots AI Copilots that are co designers with our engineers. And so this is the way we did it. So we got ourselves a Lama 2, this is a 70 B, and it's, you know, packaged up in a Nim.
And we asked it, you know, what is a CTL? Well, it turns out CTL is an internal program and it has an internal proprietary language, but it thought the CTL was a combinatorial timing logic. And so it describes, you know, conventional knowledge of CTL, but that's not very useful to us. And so we gave it a whole bunch of new examples. You know, this is no different than employee onboarding an employee. And we say, you know, thanks for that answer. It's completely wrong. And then we present to them, this is what a CTL is, okay? And so this is what a CTL is at Nvidia. And the CTL, as you can see, you know CTL stands for Compute Trace Library, which makes sense.
You know, we were tracing compute cycles all the time and it wrote the program. Is that amazing?
And so the productivity of our chip designers can go up. This is what you can do with a Nim.
First thing you can do with it's customize it. We have a service called Nemo Microservice that helps you curate the data, preparing the data so that you could teach this onboard this AI, you fine tune them, and then you guardrail it. You can even evaluate the answer, evaluate its performance against other examples. And so that's called the Nemo microservice.
Now the thing that's that's emerging here is this, there are three elements, 3 pillars of what we're doing. The first pillar is of course, inventing the technology for AI models and running AI models and packaging it up for you. The second is to create tools to help you modify it. First is having the AI technology, second is to help you modify it, and third is infrastructure for you to find, tune it, and if you like deploy it, you could deploy it on our infrastructure called dgx cloud, or you can deploy it on prem, you could deploy it anywhere you like. Once you develop it, it's yours to take anywhere. And so we are effectively an AI foundry we will do for you and the industry on AI what TSMC does for us building chips and so we go to it with our go to TSMC with our big ideas, they manufacture it and we take it with us. And so exactly the same thing here. AI Foundry and the three pillars are the Nims Nemo microservice and dgx cloud.
The other thing that you could teach the Nim to do is to understand your proprietary information. Remember, inside our company, the vast majority of our data is not in the cloud, it's inside our company, it's been sitting there, you know, being used all the time and gosh, it's basically Nvidia's intelligence. We would like to take that data, learn its meaning, like we learned the meaning of almost anything else that we just talked about, learn its meaning, and then reindex that knowledge into a new type of database called the vector database. And so you essentially take structured data or unstructured data, you learn its meaning, you encode its meaning. So now this becomes an AI database. And that AI database in the future, once you create it, you can talk to it.
And so let me give you an example of what you could do. So suppose you create, you've got a whole bunch of multi modality data, and one good example of that is PDF. So you take the PDF, you take all of your Pdf's to all your favorite, you know, the stuff that is proprietary to you, critical to your company. You can encode it just as we encoded pixels of a cat, and it becomes the word cat. We can encode all of your PDF and it turns into vector that are now stored inside your vector database. It becomes the proprietary information of your company. And once you have that proprietary information, you could check to it.
It's a smart database. And so you just chat with data. And how much more enjoyable is that?
You know, for our software team, you know, they just chat with the bugs database, you know, how many bugs was there last night? Are we making any progress? And then after you're done talking to this bugs database, you need therapy. And so we have another chat bot for you. You can do it.
Okay, so we call this Nemo retriever. And the reason for that is because ultimately its job is to go retrieve information as quickly as possible. And you just talk to it, hey, retrieve this information. It goes, oh, it brings it back to you. Is it, do you mean this? You go, yeah, perfect. Okay, and so we call it the Nemo Retriever. Well, the Nemo service helps you create all these things.
And we have all these different Nims. We even have Nims of digital humans.
I'm Rachel, your AI care manager, manager. Okay, so it's a really short clip, but there were so many videos to show you, I guess, so many other demos to show you, And so I had to cut this one short. But this is Diana. She is a digital human Nim, and you just talked to her, and she's connected in this case to Hippocratic Ai's large language model for healthcare, and it's truly amazing. She is just super smart about healthcare things, you know? And so after my Dwight, my VP of software engineering, talks to the chatbot for Bugs database, then you come over here and talk to Diane. And so Diane is completely animated with AI, and she's a digital human.
There's so many companies that would like to build. They're sitting on gold mines. The enterprise It industry is sitting on a gold mine. It's a gold mine because they have so much understanding of the way work is done. They have all these amazing tools that have been created over the years, and they're sitting on a lot of data if they could take that gold mine and turn them into Copilots, these Copilots could help us do things. And so just about every It franchise It platform in the world that has valuable tools that people use is sitting on a gold mine for Copilots. And they would like to build their own Copilots and their own chat bots. And so we're announcing that Nvidia AI foundry is working with some of the world's great companies.
Sap generates 87% of the world's global commerce. Basically, the world runs on Sap. We run on an Sap.
Nvidia and Sap are building Sap Juul co-pilots using Nvidia Nemo and dgx Cloud ServiceNow, they run 85% of the world's Fortune 500 company, run their people and customer service operations on ServiceNow, and they're using Nvidia AI Foundry to build ServiceNow assist virtual assistance cohesity backs up the world's data. They're sitting on a gold mine of data, hundreds of exabytes of data, over 10000 companies. Nvidia AI Foundry is working with them, helping them build their Gaia generative AI agent Snowflake is a company that stores the world's digital warehouse in the cloud and serves over 3 billion queries a day for 10000 enterprise customers. Snowflake is working with Nvidia AI Foundry to build Copilots with Nvidia Nemo and Nims NetApp. Nearly half of the files in the world are stored on Prem on NetApp, and Video AI Foundry is helping them build chatbots and Copilots like those vector databases and retrievers with Nvidia Nemo and Nims, and we have a great partnership with Dell, Everybody who, everybody who is building these chat bots and generative AI, when you're ready to run it, you're going to need an AI factory, and nobody is better at building end to end systems of very large scale for the enterprise than Dell. And so anybody, any company, every company will need to build AI factories.
And it turns out that Michael is here. He's happy to take your order.
Ladies and gentlemen, Michael Tel. Okay, let's talk about the next wave of robotics, the next wave of AI, robotics, physical AI. So far, all of the AI that we've talked about is one computer data comes into one computer in lots of the worlds, if you will experience in digital text form, the AI imitates us by reading a lot of the language to predict the next words. It's imitating you by studying all of the patterns and all the other previous examples. Of course, it has to understand context and so on and so forth, but once it understands the context is essentially imitating you, we take all of the data, we put it into a system like dgx, we compress it into a large language model, trillions and trillions of parameters become billions and billions, trillions of tokens becomes billions of parameters, these billions of parameters becomes your AI, well, in order for us to go to the next wave of AI where the AI understands the physical world, we're going to need three computers, the first computer is still the same computer, it's the AI computer that now it's going to be watching video, and maybe it's doing synthetic data generation, and maybe there's a lot, lot of human examples, just as we have human examples in text form, we're going to have human examples in articulation form, and the AIS will watch us understand what is happening and try to adapt it for themselves into the context, and because it can generalize with these foundation models, maybe these robots can also perform in the physical world a fairly generally. So I just described in very simple terms, essentially what just happened in large language models, except the ChatGPT moment for robotics may be right around the corner. And so we've been building the end to end systems for robotics for some time. I'm super, super, super proud of the work.
We have the AI system, dgx we have the lower system, which is called agx for autonomous systems, the world's first robotics processor. When we first built this thing, people are, what are you guys building? It's a SOC, it's 1 chip, it's designed to be very low power, but it's designed for high speed sensor processing and AI. And so if you want to run transformers in a car or you want to run Transformers in a, you know, anything that moves, we have the perfect computer for you. It's called the Jetson. And so the dgx on top are training the AI, the Jetson is the autonomous processor, and in the middle we need another computer, whereas large language models have to benefit of you providing your examples and then doing reinforcement learning, human feedback.
What is the reinforcement learning, human feedback of a robot? Well, it's reinforcement learning, physical feedback, that's how you align the robot, that's how the robot knows that as it's learning these articulation capabilities and manipulation capabilities, it's going to adapt properly into the laws of physics. And so we need a simulation engine that represents the world digitally for the robot so that the robot has a gym to go learn how to be a robot. We call that virtual world Omniverse, and the computer that runs Omniverse is called OFX and OVC, the computer itself is hosted in the Azure Cloud, okay? And so basically we built these three things, these three systems on top of it. We have algorithms for every single one.
Now I'm going to show you one super example of how AI and Omniverse are going to work together.
The example I'm going to show you is kind of insane, but it's going to be very, very close to tomorrow. It's a robotics building. This robotics building is called a warehouse. Inside the robotics building are going to be some autonomous systems. Some of the autonomous systems are going to be called humans, and some of the autonomous systems are going to be called forklifts. And these autonomous systems, we're going going to interact with each other, of course, autonomously, and it's going to be overlooked upon by this warehouse to keep everybody out of harm's way. The warehouse is essentially an air traffic controller. And whenever it sees something happening, it will redirect traffic and give new waypoints, just new waypoints to the robots and the people, and they'll know exactly what to do.
This warehouse, this building you can also talk to, of course, you could talk to it, hey, you know, sap center, how are you feeling today, for example? And so you could ask the same the warehouse, the same questions. Basically the system I just described will have Omniverse Cloud that's hosting the virtual simulation and AI running on dgx cloud, and all of this is running in real time.
Let's take a look. The future of Heavy Industries starts as a digital twin. The AI agents helping robots, workers, and infrastructure navigate unpredictable events in complex industrial spaces will be built and evaluated first in sophisticated digital twins.
This Omniverse digital twin of a 100000 square foot warehouse is operating as a simulation environment that integrates digital workers amrs running the Nvidia Isaac receptor stack. Centralized activity maps of the entire warehouse from 100 simulated ceiling mount cameras using Nvidia Metropolis and Amr route planning with Nvidia coups software in loop testing of AI agents in this physically accurate simulated environment enables us to evaluate and refine how the system adapts to real world unpredictability. Here, an incident occurs along this amr's planned route, blocking its path as it moves to pick up a pallet. Nvidia Metropolis updates and sends a real time occupancy map to kuop, where a new optimal route is calculated. The Amr is enabled to see around corners and improve its mission efficiency with generative AI powered Metropolis Vision Foundation models. Operators can even ask questions using natural language. The visual model understands nuanced activity and can offer immediate insights to improve operations.
All of the sensor data is created in simulation and passed to the real time AI running as Nvidia inference, microservices, or nyms. And when the AI is ready to be deployed in the physical twin, the real warehouse, we connect Metropolis and Isaac Nims to real sensors with the ability for continuous improvement of both the digital twin and the AI models.
Is that incredible? And so? Remember, remember, a future facility, warehouse, factory building will be software defined. And so the software is running. How else would you test the software? So you test the software to building the warehouse, the optimization system in the digital twin, what about all the robots, all of those robots you are seeing just now, they're all running their own autonomous robotic stack. And so the way you integrate software in the future, CICD in the future for robotic systems is with digital twins.
We've made Omniverse a lot easier to access. We're going to create basically Omniverse Cloud Apis for simple API in a channel, and you can connect your application to it. So this is, this is going to be as wonderfully, beautifully simple in the future that Omniverse is going to be. And with these Apis, you're going to have these magical digital twin capability.
We also have turned Omniverse into an AI and integrated it with the ability to chat USD. The language of our language is, you know, human, and Omniverse is language, as it turns out, is universal scene description. And so that language is a rather complex. And so we've taught our Omniverse that language. And so you can speak to it in English, and it would directly generate USD and it would talk back in USD, but converse back to you in English.
You could also look for information in this world semantically instead of the world being encoded semantically in language, now it's encoded semantically in scenes. And so you could ask it of certain objects or certain conditions or certain scenarios, and it can go and find that scenario for you. It also can collaborate with you in generation. You could design some things in 3D, it could simulate some things in 3D, or you could use AI to generate something in 3D. Let's take a look at how this is all going to work.
We have a great partnership with Siemens. Siemens is the world's largest industrial engineering and operations platform. You've seen now so many different companies in the industrial space, heavy industries is one of the greatest final frontiers of it, and we finally now have necessary technology to go and make a real impact. Siemens is building the industrial metaverse, and today we're announcing that Siemens is connecting their crown jewel accelerator, 2 Nvidia Omniverse. Let's take a look.
Seamus technology is transformed every day for everyone. Team Center X, our leading product lifecycle management software from the Siemens Accelerator platform, is used every day by our customers to develop and deliver products and scale.
Now we are bringing the real and digital worlds even closer by integrating Nvidia AI and Omniverse technologies into team centers. Omnibus Apis enabled data interoperability, and physics based rendering to industrial scale design and manufacturing projects. Our customers HD und market leader in sustainable ship manufacturing, builds ammonia and hydrogen power chips, often comprising over 7 million discrete parts. Omniverse Apis teams and the companies like HD Hongdae unify and visualize these massive engineering data sets interactively and integrate general AI to generate 3D objects or Hdri backgrounds to see their projects in context. The result? An ultra intuitive, photoreal physics based digital twin that eliminates waist and arrows, delivering huge savings in cost and time. And we are building this for collaboration, whether across more Siemens accelerator tools like Siemens Annex or Star CCM Plus, or across teams working on their favorite devices in the same scene together.
And this is just the beginning. Working with ividia, we will bring accelerated computing, generative AI, and omnivore integration across the Siemens Accelerator port 4D. Let's.
The professional voice actor happens to be a good friend of mine, Roland Bush, who happens to be the CEO of Siemens.
Once you get Omniverse connected into your workflow, your ecosystem, from the beginning of your design to engineering to manufacturing planning all the way to digital twin operations, once you connect everything together, it's insane how much productivity you can get. And it's just really, really wonderful. All of a sudden, everybody's operating on the same ground. Truth, you don't have to exchange data and convert data, make mistakes. Everybody is working on the same ground truth from the design department to the art department, the architecture department, all the way to the engineering and even the marketing department. Let's take a look at how Nissan has integrated Omniverse into their workflow, and it's all because it's connected by all these wonderful tools and these developers that we're working with. Take a look.
Mei Zhong yan zhe dong shi jiuan chean FA chun du te de Meili. I'm.
I'm.
That was not an animation. That was Omniverse. Today we're announcing that Omniverse Cloud streams to the Vision Pro and.
It is very, very, very strange that you walk around virtual doors. When I was getting out of that car and everybody does it. It is really, really quite amazing. Vision Pro connected to Omniverse portals you into Omniverse, and because all of these CAD tools and all these different design tools are now integrated and connected to Omniverse, you can have this type of workflow. Really incredible.
Let's talk about robotics. Everything that moves will be robotic, there's no question about that. It's safer, it's more convenient, and one of the largest industries is going to be a automotive. We build the robotic stack from top to bottom, as I was mentioned from the computer system, but in the case of self driving cars, including the self driving car application at the end of this year, or I guess beginning of next year, we will be shipping in Mercedes and then shortly after that, JLR. And so these autonomous robotic systems are software defined. They take a lot of work to do, has computer vision, has obviously artificial intelligence control and planning, all kinds of very complicated technology and takes years to refine.
We're building the entire stack. However, we open up our entire stack for all of the automotive industry. This is just the way we work, the way we work in every single industry. We try to build as much of it as we can so that we understand it, but then we open it up so everybody can access it.
Whether you would like to buy just our computer, which is the world's only full functional, safe Ald system that can run AI, this functional safe ACL D quality computer, or the operating system on top, or of course, our data centers, which is in basically every Av company in the world, However, you would like to enjoy it, we're delighted by it.
Today, we're announcing that BYD, the world's largest OVC, is adopting our next generation. It's called Thor. Thor is designed for transformer engines. Thor, our next generation Av computer will be used by BYD.
You probably don't know this fact that we have over a million robotics developers. We created Jetson, this robotics computer. We're so proud of it. The amount of software that goes on top of it is insane. But the reason why we can do it at all is because it's 100% could have compatible everything that we do. Everything that we do in our company is in service of our developers. And by us being able to maintain this rich ecosystem and make it compatible with everything that you access from us, we can bring all of that incredible capability to this little tiny computer we call Jetson, a robotics computer.
We also today are announcing this incredibly advanced new SDK, we call it Isaac perceptor Isaac perceptor most of the robots today are pre-programmed.
They're either following rails on the ground, digital rails, or they'd be following April tags. But in the future, they're going to have perception. And the reason why you want that is so that you could easily program it. You say, would you like to go from point A to point B? And it will figure out a way to navigate its way there. So by only programming waypoints, the entire route could be adaptive. The entire environment could be re-programmed, just as I showed you at the very beginning with the warehouse. You can't do that with pre-programmed agv's. If those boxes fall down, they just all gum up and they just wait there for somebody to come clear it.
And so now with the Isaac perceptor, we have incredible state of the art vision odometry, 3D reconstruction, and in addition to 3D reconstruction depth perception. The reason for that is so that you can have two modalities to keep an eye on what's happening in the world. Isaac, Isaac perceptor the most used robot today, is a manipulator manufacturing arms and they are also pre programmed.
The computer vision algorithms, the AI algorithms, the control and path planning algorithms that are geometry aware, incredibly computationally intensive. We have made these kuda accelerated, so we have the world's first kuda accelerated motion planner that is geometry aware. You put something in front of it, it comes up with a new plan and articulates around it. It has excellent perception for Poe's estimation of a 3D object, not just not it's pose in 2D, but it's pose in 3D, So it has to imagine what's around and how best to grab it. So the foundation pose, the grip foundation, and the articulation algorithms are now available. We call it Isaac manipulator. And they also just run on V as computers.
We are starting to do some really great work in the next generation of robotics. The next generation of robotics will likely be a human Ord robotics.
We now have the necessary technology and as I was describing earlier, the necessary technology to imagine generalized human robotics. In a way, human robotics is likely easier. And the reason for that is because we have a lot more imitation training data that we can provide the robots, because we are constructed in a very similar way, it is very likely that the human robotics will be much more useful in our world because we created the world to be something that we can interoperate in and work well in. And the way that we set up our workstations and many factoring and logistics, they were designed for humans, they were designed for people. And so these human or robotics will likely be much more productive to deploy while we're creating just like we're doing with the others, the entire stack starting from the top a foundation model that learns from watching video human examples.
It could be in video form, it could be in virtual reality form. We then created a gym for it called Isaac Reinforcement Learning Gym, which allows the humanoid robot to learn how to adapt to the physical world, and then an incredible computer, the same computer that's going to go into a robotic car, this computer will run inside a human or robot called Thor. It's designed for transformer engines. We've combined several of these into one video.
This is something that you're going to really love, take a look. It's not enough for humans to imagine.
We have to invent. And explore and push beyond what's been done a fair amount of detail.
We create smarter and faster. We push it to fail so it can learn. We teach it, then help it teach itself. We broaden its understanding. To take on new challenges. With absolute precision. I'm succeed. We make it perceive and move. And even reason. So it can share our world with us.
This is where inspiration leads us. The next frontier. This is Nvidia project great. A general purpose foundation model for humanoid robot learning, The group model takes multimodal instructions and past interactions as input and produces the next action for the robot to execute. We developed Isaac Lab, a robot learning application to train group on Omniverse, Isaac Sim, and we scale out with Osmo, a new compute orchestration service that coordinates workflows across dgx systems for training and O Vx systems for simulation. With these tools, we can train Groot in physically based simulation and transfer zero shot to the real world.
The group model will enable a robot to learn from a handful of human demonstrations so it can help with everyday tasks. And emulate human movement just by observing us. This is made possible with Nvidia's technologies that can understand humans from videos, train models, and simulation, and ultimately deploy them directly to physical robots. Connecting group to a large language model even allows it to generate motions by following natural language instructions.
Hi Joe, one here, give me a high five. You're big. Let's high 5 5, can you give us some komos dirt dirt, check this out. All this incredible intelligence is powered by the new Jets and Thor Robotics chips designed for group built for the future with Isaac Lab Osmo and Groot, we're providing the building blocks for the next generation of AI powered robotics.
About the same size.
The soul of avidia, the intersection of computer graphics, physics, artificial intelligence, it all came to bear at this moment. The name of that project? General Robotics 0 0 3. I know, super good. Super good. Well, I think we have some special guests. Two weeks?
Hey, guys. So I understand you guys are powered by Jetson. They're powered by Jetsons, little Jetson Robotics computers inside. They learned a walk in Isaac Sim. Ladies and gentlemen, this, this is orange and this is the famous green. They are the bdx robots of Disney. Amazing, amazing Disney research. Come on, you guys, let's wrap up. Let's go. Five things. Where you going? I sit right here. Don't be afraid. Come here. Green. Hurry up. What do he saying? No, it's not time to eat. To help can be. I'll give you a snack in a moment. Let me finish up real quick. Come on green, hurry, hurry up, Stop wasting time Five things, 5 things.
First, a new industrial revolution, every data center should be accelerated, a trillion dollars worth of installed data centers will become modernized over the next several years. Second, because of the computational capability we brought to bear, a new way of doing software has emerged. Generative AI, which is going to create new infrastructure dedicated to doing one thing and one thing only, not for multi user data centers, but AI generators. These AI generation will create incredibly valuable software, a new industrial revolution.
Second, the computer of this revolution, the computer of this generation, generative AI, trillion parameters, Blackwell insane amounts of computers and computing. Third I'm trying to concentrate. Good job.
Third, new computer New computer creates new types of software. New types of software should be distributed in a new way that it can, on the one hand, be an endpoint in the cloud and easy to use, but still allow you to take it with you because it is your intelligence. Your intelligence should be packaged up in a way that allows you to take it with you. We call them Nims. And third, these Nims are going to help you create a new type of application for the future, not one that you wrote completely from scratch, but you're going to integrate them like teams create these applications. We have a fantastic capability between Nims, the AI technology, the tools Nemo, and the infrastructure dgx cloud in our AI Foundry to help you create proprietary applications, proprietary chatbots, and then lastly, everything that moves in the future will be robotic.
You're not going to be the only one. And these robotic systems, whether they are humanoid, amrs, self driving cars, forklifts, manipulating arms, they will all need one thing. Giant stadiums, warehouses, factories. There can be factories that are robotic, orchestrating factories, manufacturing lines that are robotics, building cars that are robotics. These systems all need one thing. They need a platform, a digital platform, a digital twin platform, and we call that Omniverse, the operating system of the robotics world.
These are the five things that we talked about today. What does Nvidia look like? What does Nvidia look like? When we talk about Gpu's, there's a very different image that I have when I, when people ask me about Gpu's first, I see a bunch of software stacks and things like that. And second, I see this, this is what we announced to you today, this is Blackwell, this is the platform.
Amazing processors, mv link switches, networking systems, and the system design is a miracle. This is Blackwell and this to me is what a GPU looks like in my mind.
Listen, orange, green. I think we have one more treat for everybody. What do you think? Should we, okay, we have one more thing to show you Rollins.
Thank you. Thank you, Have a great, have a great GTC Thank you all for coming. Thank you.
微信扫码关注该文公众号作者