By Rongguo Zhang, AI R&D Engineer, Inspur Information
GPUs offer a huge advantage in large-scale parallel computing, offering ideal computing acceleration for big data, AI training & inference, and image rendering, etc. However, GPU processing often faces many real world challenges like poor resource management and low utilization. AIStation is an Inspur-developed AI development platform specifically designed to deal with these issues by offering an easy to set up and refined GPU resource scheduling system.
Pain points for maintaining GPU computing resources
For any AI developers, AI system researchers, and enterprises in the midst of digital transformation, the following issues may be experienced when utilizing GPU computing resources:
To deal with these issues, the AIStation inference platform allows steady allocation, scheduling, and management of fine-grained GPU resources, presenting itself as an optimal solution for enterprise users to efficiently utilize GPU resources.
Overview of AIStation's GPU sharing
The AIStation inference platform has a GPU sharing system that allows any applications that utilize GPUs as computing resources to have a single GPU accelerator card shared among multiple containers (or services). AIStation offers capabilities for fine-grained allocation and scheduling of the GPU memory and kernel. More specifically, it allows both fine-grained GPU kernel and memory division. Users can deploy different types of services on the same GPU so that the utilization rate of GPU resources can reach 100%. Moreover, AIStation ensures memory isolation among different services. By calculating the optimal scheduling strategy, AIStation offers a scheme that minimizes surplus resources and ensures security for pre-deployed services. When services are properly scheduled to different GPUs, idle GPU resources are available to other services. This resource scheduling also applies to GPU resources across nodes.
AIStation can also offer fine-grained GPU resource scaling based on HPA and QPS, which indicates that the number of copies of services can be scaled according to such metrics as CPU utilization, average memory utilization, and QPS.
Extremely low loss in computing performance. AIStation's GPU sharing system has an average performance loss of 1.3%, which has minimal impact on application performance.
AIStation's scenario-based design
Non-invasive architectural design. AIStation can be easily integrated into other platforms, and deployed with only YAML and Docker images. It is available out-of-the-box and ready to go.
High availability (HA). In the GPU sharing system, each control component is designed to be highly available. At the same time, only one of the instances for each module is the leader, which delegates activity for the module. If the leader crashes, a new leader is immediately selected to ensure high availability of the control component.
Refined monitoring. AIStation can monitor the GPU utilization of each user’s applications in real time, and calculate and store relevant data, thus facilitating GPU utilization monitoring in a refined and real-time manner.
A company in the financial industry was in need of a unified algorithm application platform for its insurance service to centrally manage different ISV algorithm applications, and improve resource utilization. The reuse rate of their GPU resources was severely restrained, requiring human intervention to handle the massive amounts of inference and calculations. If the peak load readjustment was not executed in time, various problems such as slow responses to requests, high computation latency, and computation interruption would emerge.
By introducing the AIStation inference platform, resource management was greatly improved in all large-scale inference scenarios. Most notably, the reuse rate of GPU resources was increased by 300%. This not only allowed the customer to flexibly deal with different types of online inference services, but also greatly enhanced the stability of their business system.
A company in the energy industry has two 8-card V100 GPU servers with 32 GB of memory. This is shared amongst a 28-person development team. The company required the proper allocation of the available 16 GPUs for inference tests by its developers. With fewer GPUs than developers, efficiently allocating and utilizing GPU resources was a major problem.
With Inspur AIStation, each GPU was divided into 8 instances, and was allocated 4 GB of memory. In this way, the 16 GPU cards allowed for 128 instances for developers, with 4 to 5 instances available to each developer. The utilization rate of each GPU became 8 times higher than before.
AIStation is an Inspur-developed AI development platform specifically designed to deal with these issues by offering an easy to set up and refined GPU resource scheduling system.
Inspur joined with Xishuangbanna National Nature Reserve to develop an extensive technology system for the conservation of some 300 Asian elephants in Yunnan, China.
Inspur teams up with DNA lab to trace the origin of human civilization
With the help of today’s intelligent computing, researchers are now more easily able to find out more about our world from critically examining the artifacts of the past.
By Arthur Kang, SPEC OSSC Member / OSG ML Chair, Performance Architect, Inspur Information
The advanced language capabilities of Yuan 1.0 necessitates a high number of parameters, which brings many challenges in model training and deployment. This article focuses on the computing challenges of Yuan 1.0 and the training methods used.
The amazing enhancements of Intel's new third-generation Xeon Scalable processors (Ice Lake),
lots of factors can influence the training performance. In this article, we use the ResNet50 model from MLPerf Training v1.0 benchmarks as an example to describe how to improve training speed with hardware and software optimization
Integrates hardware and software that helps data centers improve construction efficiency, simplifies operations management, and enhances operational efficiency.
New specification provides current system compatibility and a framework for mixed accelerator hardware applications
As basic IT infrastructure shifts from the on-prem private IT datacenter to the public/hybrid cloud with the growing demand for more computing performance, the world is facing a new challenge
As the world enters the 5G era, the common impression is that 5G will bring down consumer prices of information services.