The hardware behind ChatGPT

3

Replies
4662

views

Level 52

Saka

16-03-23

Currently an extremely popular tool is ChatGPT from Open.AI. It is based on GPT-3 language model. Thanks to the very intense deep learning it went through, it can generate quite sensible pieces of text, depending on the use case. It is still not recommended to be used for scientific purposes, as it is known to produce something akin to "believable bullsh*t".

How intensive was the training actually?
UBS analyst Timothy Arcuri claims ChatGPT used 10,000 Nvidia GPUs to train the model and that number seems to be listen in multiple sources. The GPUs in question are Nvidia V100. The price of each unit often exceeds $10,000, which makes the project really expensive. Microsoft was responsible for building the supercomputer for Open.AI, but refused to disclose the cost, saying only that it was "several hundred millions". An interesting fact is that by 2020, when the supercomputer was built, V100 based on Volta were already a quite old tech, but its successor from Ampere came out a bit too late for the project, as such scale of investment needs thorough planning.

Currently ChatGPT runs on 8 of A100 GPUs as it is past the learning stage.

Serve the Home has a really nice article describing these GPUs. Basically they are a product designed for servers handling big data and similar stuff. Hence they have no fans or display outputs and are equipped with massive heatsinks that allows dissipation of large amounts of heat and stacking them densely.

Name: nvidia-a100-pcie-3qtr-top-left-2c50-d.jpg
Views: 3793
Size: 30.7 KB

Name: nvidia-a100-pcie-3qtr-top-left-2c50-d.jpg
Views: 3793
Size: 30.7 KB

Spec sheet on Nvidia's website. A noticeable thing is the huge amount of VRAM. It is what allows large amounts of data to be processed in a single batch. Just to give some ideas, on my mobile 3070 I am not able to run diffusion algorithm of a larger picture than 250x250 because my 8GB VRAM isn't enough. To generate a wallpaper-sized picture pretty much one needs a server level GPU or a workstation with several professional GPUs.

Now, Microsoft announced that they are upgrading these A100 GPUs with a new generation, H100.

Here's a fairly technical video that explains the hardware used. I don't recommend watching it at very late hour though, as it uses a lot of big words:

Unamused Snarktooth. Advocate for hearing loss & accessibility. Person, friend and a terrible/terrific* artist.
*delete as appropriate

3 Replies

Level 52

GoLLuM13

17-03-23

Thank you for this article, that was very interesting to read

Tag me to be sure I see the answer and reply to you / Taguez moi pour être sûr que je vois la réponse et vous réponde en retour
Most of my writings in no particular order (mostly in French) / La plupart de mes écrits sans ordre particulier
>> HERE/ ICI <<

Level 52

Saka

17-03-23

@GoLLuM13 Glad you liked it, I accidentally stumbled upon the announcement from Microsoft and started digging a little more. 😊

There were some other technical topics I've been meaning to write, but my migraines have pushed that back and now I don't know if there's any point of discussing not so fresh news. Although maybe I will put something together for the sake of discussion (it's about the Intel discrete graphics if that helps).

Unamused Snarktooth. Advocate for hearing loss & accessibility. Person, friend and a terrible/terrific* artist.
*delete as appropriate

Level 17

AhmedOsmaan

30-03-23

I find the development of chatgpt from Open.AI to be really fascinating. The fact that it is based on the GPT-3 language model and has gone through intense deep learning to produce sensible text is quite impressive. It's interesting to know that it used around 10,000 Nvidia V100 gpus for training, which is a really expensive investment. The fact that it runs on 8 of A100 gpus now that it's past the learning stage is impressive as well.