NVIDIA today introduced its A2 Tensor Core GPU, an entry-level GPU that’s capable of delivering 20 percent more inference performance than CPUs despite its low power and small footprint. Inference is a component of the Deep Learning process whereby trained models developed during the training (i.e., fact gathering or learning) phase are actually applied and processed in an application. NVIDIA’s A2 Tensor Core GPU provides up to 1.3x more performance in various intelligent edge use cases for this purpose.
From NVIDIA:
A2’s versatility, compact size, and low power exceed the demands for edge deployments at scale, instantly upgrading existing entry-level CPU servers to handle inference. Servers accelerated with A2 GPUs deliver higher inference performance versus CPUs and more efficient intelligent video analytics (IVA) deployments than previous GPU generations—all at an entry-level price point. […]
AI inference is deployed to make consumer lives more convenient through real-time experiences, and enables them to gain insights on trillions of end-point sensors and cameras. Compared to CPU-only servers, the servers built with NVIDIA A2 Tensor Core GPU offer up to 20X more inference performance, instantly upgrading any server to handle modern AI.
A2 Tensor Core Specifications
Peak FP32 | 4.5 TF |
TF32 Tensor Core | 9 TF | 18 TF |
BFLOAT16 Tensor Core | 18 TF | 36 TF |
Peak FP16 Tensor Core | 18 TF | 36 TF |
Peak INT8 Tensor Core | 36 TOPS | 72 TOPS |
Peak INT4 Tensor Core | 72 TOPS | 144 TOPS |
RT Cores | 10 |
Media engines | 1 video encoder 2 video decoders (includes AV1 decode) |
GPU memory | 16GB GDDR6 |
GPU memory bandwidth | 200GB/s |
Interconnect | PCIe Gen4 x8 |
Form factor | 1-slot, low-profile PCIe |
Max thermal design power (TDP) | 40–60W (configurable) |
Virtual GPU (vGPU) software support | NVIDIA Virtual PC (vPC), NVIDIA Virtual Applications (vApps), NVIDIA RTX Virtual Workstation (vWS), NVIDIA AI Enterprise, NVIDIA Virtual Compute Server (vCS) |
Source: NVIDIA