The rapidly expanding complexity of deep learning models

Blue image download3 1000

It was recently reported that Google trained a trillion-parameter NLP AI model. While this sounds like an enormous number of parameters, it should not come as a surprise. After all, models keep growing: OpenAI’s GPT-3 (Generative Pre-trained Transformer) had 175 billion parameters, its predecessor GPT-2 had 1.5 billion parameters, and Microsoft’s Turing Natural Language Generation (T-NLG) model had 17 billion parameters.

What do you get when you add parameters? Ostensibly, you get a better model and more natural interaction.

But you also get increased computational requirements. Larger networks take more operations to train and execute. OpenAI published a chart (recreated as Figure 1 below) showing that the computational requirements to train deep learning models were doubling every 3.4 months! This translates to higher processing costs in the cloud or potentially slower response time which might be important for interactive applications.

Counteracting this, OpenAI also showed that algorithmic advances are helping to compensate for this growth (Figure 2 below). From 2012 to 2020, the amount of computation needed to train an image classification model has decreased by 44x, or roughly by half every 16 months.

Is it inevitable that models will get slower and more expensive? Can algorithmic advances help stem the tide? Clearly, any methods that optimize performance or reduce the computation will help continue the inexorable advancement of AI. We think it is possible for AI-driven expert systems to automatically optimize computationally dense workloads (see our previous post on machine programming). We’re working on our first products to demonstrate this, and we’ll be sharing more in coming weeks.

AlexNet to AlphaGo Zero: A graph showing a 300,000x Increase in compute

Figure 1: AlexNet to AlphaGo Zero: A 300,000x Increase in compute. From OpenAI

A graph showing compute efficiency gains from 2012-2020

Figure 2: 44x less compute required to get to AlexNet performance 7 years later (log scale). From OpenAI