What would a 100x performance increase mean to your software? Given the speed of today’s processors, some types of applications are fast enough. But there are many applications that are both computationally demanding and require high performance. Examples include real-time sensing, games, voice recognition, and evaluating risk in product portfolios. Lack of performance in such applications can be irritating (in the case of games and voice recognition), costly (when evaluating risk in volatile financial markets), or even deadly (computer vision for autonomous vehicles).
How do you make such demanding applications fast enough? Traditionally, companies have turned to specialized performance engineers that understand both the workload and the underlying hardware architecture. These people then deploy their “mad ninja skilz” to optimize the software, and they are often able to increase the performance by 10x and sometimes by 100x or more. The difference between the performance of naïve code and optimized code (i.e., the 99%) is sometimes called the “ninja gap.”
Unfortunately, these “ninja” programmers are becoming increasingly rare. A typical “ninja” programmer has a PhD in computer engineering and is an expert in systems, compilers, languages, and microarchitecture. “Ninja programming skilz” are not typically taught in universities. Instead they are acquired by people who relish this challenging area of engineering. The finite set of ninjas who have acquired the requisite skills are in high demand.
Bridging the Ninja Gap
So what are the possibilities for addressing the “ninja gap”?
Training more people to become ninja programmers is not necessarily the best answer as we discussed in a previous blog post. One of the biggest barriers to this approach is the increasing complexity of hardware and software. For example, the knowledge and skills that are required to get a 10x or 100x increase in application performance can differ significantly depending on the target processor, tool chain, and processor vendor. The separate memory space and massive parallelism of a GPU can require an additional, entirely new knowledge base and set of skills that again can significantly differ according to the architecture of the GPU, host CPU system, the available software tool chain, and of course, the GPU vendor.
In the absence of skilled optimization engineers, some companies choose to throw more hardware at the problem. The idea is to make up the performance difference caused by inefficient software by running on more processor cores in the cloud or procuring more systems.
Make everyone a Ninja programmer
Both previous options are costly. We believe that there is a third, new option for addressing the ninja gap. We think it is now possible for machines to automate parts of the performance optimization task (see our discussion of machine programming in a previous post). This belief is predicated on the observation that machine learning (ML) has the capacity to reshape the way software is developed. Academic research results indicate that automatic code optimization is not only possible, but capable of producing results better than the best human-optimized code.
According to the authors of The Three Pillars of Machine Learning, it should be possible for everyone to become a ninja programmer. “We envision machine learning and automated reasoning techniques that will enable new programming systems; systems that will deliver a significant degree of automation to reduce the cost of producing secure, correct, and efficient software. These systems will also enable non-programmers to harness the full power of modern computing platforms to solve complex problems correctly and efficiently.”
Automation can provide enormous benefit to companies developing performance-critical software. Companies that already have ninja programmers in their employ can “10X” the capabilities of their team by equipping them with AI-driven software optimization technology. And companies without such a team in place can utilize such technology, to improve performance of their software.
AUTOMATIC AND LOSSLESS ACCELERATION OF DEEP LEARNING INFERENCE WORKLOADS
Use our AI technology to make your deep learning deployments faster and cheaper and to accelerate your AI development workflow.
Sound too good to be true? We’d love to tell you how we do it.