Deep LearningDeep learning models such as GoogLeNet and ResNet have very demanding compute requirements. GoogLeNet requires 1.43GB operations and has 7M parameters. ResNet requires 3.9GB operations and 25.5M parameters. These all require hardware that can accommodate these types of complex compute requirements . If this processing is done using traditional CPUs, it would take a long time to complete whereas with the appropriate hardware acceleration, this time can be reduced dramatically.
Machine LearningMachine Learning uses compute-intensive algorithms and advanced statistical techniques to enable software applications and computing systems to “learn” with data, and to achieve the prediction of outcomes with a high level of accuracy without being explicitly programmed. Compute acceleration speeds the time to value when data mining historical relationships and trends using complex analytical models, allowing researchers, data scientists, bioinformaticians, engineers and others to improve productivity through speed, accuracy, consistency, and repeatability.
TranscodingTranscoding remains a CPU intensive process. High Efficiency Video Coding (HEVC), also known as H.265, is a video compression standard designed to substantially improve compression efficiency compared to prior generations (example, AVC/H.264). However, with the increasing growth of video streaming on the Internet over popular websites such as Netflix, Hulu, YouTube, etc., and with 4K cameras gaining market share, the bandwidth requirement is growing at a much faster rate. However, with the increasing adoption of HEVC, the traditional transcoding services are not able to keep up with demand for compute performance.
NVXL’s Universal Acceleration Server (UAS) is an innovative new platform that can accelerate multiple workloads with each workload accelerated using single or multiple FPIPEs. The internal PCIe fabric supports an aggregate bandwidth of 96GB/sec for accelerations and provides high performance and lower latency. The persistent fabric controls the scheduling and data movement between accelerators and can be PCI-based, RDMA-based (Ethernet-based or InfiniBand), or a hybrid. Multiple DS-1 servers can be connected via RDMA over Ethernet (or InfiniBand) to achieve the cloud-scale disaggregated resource pools that are necessary for massive parallel processing across thousands of FPGAs.
NVXL’s FPGA-based architecture in the DS-1 UAS is far superior at efficiently processing algorithmic operations than commodity-based CPUs which execute strings of commands on a fixed number of individual cores. A single FPGA transforms algorithmic functions, such as in deep learning, into massively parallel logic elements, efficiently executing operations at exceptional speeds and with high accuracy.