The hardware behind ChatGPT

  • 1
    Reply
  • 244
    views
  • Saka's Avatar
    Level 52
    I posted this yesterday on the English boards, but in case some people prefer to interact in the Nordics boards, crossposting it. I have stumbled upon a mention of Microsoft upgrading the system running ChatGPT and it got me intrigued, so I started digging around the topic. This post is a compilation of several sources.

    Currently an extremely popular tool is ChatGPT from Open.AI. It is based on GPT-3 language model. Thanks to the very intense deep learning it went through, it can generate quite sensible pieces of text, depending on the use case. It is still not recommended to be used for scientific purposes, as it is known to produce something akin to "believable bullsh*t".

    How intensive was the training actually?
    UBS analyst Timothy Arcuri claims ChatGPT used 10,000 Nvidia GPUs to train the model and that number seems to be listen in multiple sources. The GPUs in question are Nvidia V100. The price of each unit often exceeds $10,000, which makes the project really expensive. Microsoft was responsible for building the supercomputer for Open.AI, but refused to disclose the cost, saying only that it was "several hundred millions". An interesting fact is that by 2020, when the supercomputer was built, V100 based on Volta were already a quite old tech, but its successor from Ampere came out a bit too late for the project, as such scale of investment needs thorough planning.


    Currently ChatGPT runs on 8 of A100 GPUs as it is past the learning stage.

    Serve the Home has a really nice article describing these GPUs. Basically they are a product designed for servers handling big data and similar stuff. Hence they have no fans or display outputs and are equipped with massive heatsinks that allows dissipation of large amounts of heat and stacking them densely.

    Name:  nvidia-a100-pcie-3qtr-top-left-2c50-d.jpg
Views: 57
Size:  30.7 KB

    Spec sheet on Nvidia's website. A noticeable thing is the huge amount of VRAM. It is what allows large amounts of data to be processed in a single batch. Just to give some ideas, on my mobile 3070 I am not able to run diffusion algorithm of a larger picture than 250x250 because my 8GB VRAM isn't enough. To generate a wallpaper-sized picture pretty much one needs a server level GPU or a workstation with several professional GPUs.

    Now, Microsoft announced that they are upgrading these A100 GPUs with a new generation, H100.

    Here's a fairly technical video that explains the hardware used. I don't recommend watching it at very late hour though, as it uses a lot of big words:

    Unamused Snarktooth. Advocate for hearing loss & accessibility. Person, friend and a terrible/terrific* artist.
    *delete as appropriate
  • 1 Reply

  • DoctorEldritch's Avatar
    Community Manager
    @Saka Those AIs and ChatGPT are more and more making their way out of specialized niche products and into popular culture. Even South Park made one of the latest episodes about them, I hear. I guess this shows that people are becoming more interested in them, more so as they start to enter our daily lives.