The Future of Machine Learning is Already Here
By Hari H. Krishnaan
Senior Product Manager, NVXL Technology, Inc.
Part 2 of 2

In Part 1 of my blog, I explained the nature of machine learning, its challenges and history. As we know from Part 1, machine learning has been around since the 1940s. Today, some of the popular and highly used compute intensive machine learning algorithms include Logistic regression, Decision trees, Random forests, K-Means clustering, Alternating least squares and Naïve Bayes. What they all have in common is that they all take an extremely long time to complete using hardware without acceleration. Also, machine learning models such as GoogLeNet and ResNet have very demanding compute requirements. GoogLeNet requires 1.43GB operations and has 7M parameters. ResNet requires 3.9GB operations and 25.5M parameters. All of the above-mentioned machine learning algorithms and models require hardware that can accommodate these types of complex compute requirements. If this processing is done using traditional CPUs, then it would take a long time to complete. However, with the appropriate hardware acceleration, this time can be reduced dramatically.

To address the void caused by the slowdown in compute performance improvements, NVXL has engineered a scalable approach to address the ever-increasing appetite for compute performance using a new technology: polymorphic acceleration. It is built on the general premise that a multitude of compute-intensive applications need a multitude of computing architectures including CPU, FPGA, GPU, or a new breed of ASICs that are currently in development. Hence, instead of building silos of specific computing architectures, data centers can use polymorphic acceleration to offer software-defined acceleration platforms in real time using a flexible accelerator pool that can easily be expanded and upgraded. Application performance is improved at a lower cost without the risk of betting on one particular device architecture.

As the name suggests, polymorphic acceleration is all about providing a single interface for multiple computing architectures and accelerating an application. It unifies and creates synergy among various aspects of computing that have formerly been considered in isolation.

From a programming point of view, polymorphic acceleration generalizes conventional heterogenous computing by providing a common programming interface so that applications can be executed on a mix of device architectures including CPU, FPGA, and GPU. Moreover, users do not need to use separate software stacks to execute on GPU or FPGA devices. Future device architectures can also be added to the mix by adding architecture-specific backends.

NVXL’s technology accelerates compute-intensive data flow graphs, algorithms, or functions for CNN/DNN, machine learning, transcoding, encryption, compression, database query, and others. NVXL Acceleration Layer supports both NVXL-created and partner-created libraries: CaffeNet, AlexNet, VGGNet, ResNet, GoogLeNet, U-Net, SqueezeNet, SSD, YOLO2, RNN, LSTM, GRU, PairHMM, Logistic regression, K-Means, Decision trees, Random forests, and H.265 (NoSQL acceleration support will be available in the future). NVXL Acceleration Layer also provides efficient integration with industry-leading frameworks including Caffe, Caffe2, TensorFlow, GATK, FFMPEG, GATK, and Apache Spark.

Traditionally, many applications have had to rely on traditional scaling which is structured simply and designed around maximum demand. This method is designed to have the application running all the time and at maximum capacity which is inefficient. Time-based scaling represents an improvement over traditional scaling because it is designed for maximum use during peak times of the day, but it still has a fairly high Total Cost of Ownership (TCO).

Real time scaling, however, is designed for minimum demand. Using this approach, different algorithms can be dynamically loaded and provisioned so that the system is off doing other tasks until needed. This keeps utilization low and offers a much higher levels of efficiency and TCO allowing ‘spare’ compute power to be utilized in other applications when not needed. See the graphic below.

Real-Time scaling for the best TCO

Harnessing the power and flexibility of FPGAs, the NVXL acceleration platform integrates seamlessly to industry-leading frameworks and implements a data flow architecture using dynamically provisioned instances created from a pool of acceleration devices. The NVXL platform abstracts all the low-level details for developers and data scientists.

NVXL is leading the way, providing innovative acceleration platform solutions designed to accelerate and scale compute performance and efficiency in machine learning to new levels while attaining a new standard in total cost of ownership.

Untitled Document