The NPU operates at clock rates of up to 900 MHz, delivering computing performance of up to 4.5 TOPS (Trillion Operations Per Second). Optimized for AI models based on convolutional neural networks, it includes a Parallel Processing Unit (PPU) with 32-bit floating-point pipelining and threading.
...
Model | Model Size | Input shape [n, c, h, w] | Total (DDR) read BW | Total (DDR) write BW | Average inference time | Frame rate without other latency |
AlexNet (.onnx) | 233 MB | [1, 3, 224, 224] | 47.03 (MBytes) | 1.31 (MBytes) | 8.41ms | 118.91 (fps) |
Inception-v1 (.onnx) | 27 MB | [1, 3, 224, 224] | 16.7 (MBytes) | 5.24 (MBytes) | 3.97ms | 251.89 (fps) |
Inception-v2 (.onnx) | 43 MB | [1, 3, 224, 224] | 14.47 (MBytes) | 1.84 (MBytes) | 7.68ms | 130.21 (fps) |
MobileNet-v2 (.onnx) | 14 MB | [1, 3, 224, 224] | 5.25 (MBytes) | 1.24 (MBytes) | 1.94ms | 515.46 (fps) |
EfficientNet-Lite4 (.onnx) | 50 MB | [1, 3, 224, 224] | 15.69 (MBytes) | 4.68 (MBytes) | 5.00ms | 200.00 (fps) |
ResNet-50 (.onnx) | 98 MB | [1, 3, 224, 224] | 39.61 (MBytes) | 13.28 (MBytes) | 16.29ms | 61.39 (fps) |
SqueezeNet (.onnx) | 4.8 MB | [1, 3, 224, 224] | 2.33 (MBytes) | 0.37 (MBytes) | 1.29ms | 775.19 (fps) |
VGG-16 (.onnx) | 528 MB | [1, 3, 224, 224] | 121.06 (MBytes) | 6.97 (MBytes) | 22.26ms | 44.92 (fps) |
DenseNet-121 (.onnx) | 32 MB | [1, 3, 224, 224] | 26.55 (MBytes) | 8.86 (MBytes) | 21.12ms | 47.35 (fps) |
GoogleNet (.onnx) | 27 MB | [1, 3, 224, 224] | 15.02 (MBytes) | 4.89 (MBytes) | 3.64ms | 274.73 (fps) |
CaffeNet (.onnx) | 233 MB | [1, 3, 224, 224] | 46.13 (MBytes) | 0.37 (MBytes) | 7.09ms | 141.04 (fps) |
ShuffleNet-v2 (.onnx) | 8.8 MB | [1, 3, 224, 224] | 4.14 (MBytes) | 1.93 (MBytes) | 2.09ms | 478.47 (fps) |
SSD-MobilenetV1 (.tflite) | 26.2 MB | [1, 320, 320, 3] | 11.34 (MBytes) | 5.21 (MBytes) | 5.97ms | 167.50 (fps) |
SSD-MobilenetV2 (.tflite) | 17.1 MB | [1, 320, 320, 3] | 12.21 (MBytes) | 6.04 (MBytes) | 5.17ms | 193.42 (fps) |
YOLO-v2 (.onnx) | 203.9 MB | [1, 3, 416, 416] | 47.16 (MBytes) | 6.70 (MBytes) | 11.50ms | 86.96 (fps) |
YOLO-v5s (.onnx) | 27.9 MB | [1, 3, 640, 640] | 87.91 (MBytes) | 46.65 (MBytes) | 43.64ms | 22.91 (fps) |
YOLO-v5s-seg (.onnx) | 29.4 MB | [1, 3, 640, 640] | 130.79 (MBytes) | 78.22 (MBytes) | 58.46ms | 17.11 (fps) |
YOLO-v8s-seg (.onnx) | 45 MB | [1, 3, 640, 640] | 163.19 (MBytes) | 101.29 (MBytes) | 64.45ms | 15.52 (fps) |
ArcFace (.onnx) | 248.9 MB | [1, 3, 112, 112] | 46.19 (MBytes) | 5.32 (MBytes) | 17.37ms | 57.57 (fps) |
DeepLab-v3p (.onnx) | 22.1 MB | [1, 3, 640, 640] | 385.65 (MBytes) | 129.15 (MBytes) | 107.76ms | 9.28 (fps) |
3DDFA (.onnx) | 12.4 MB | [1, 3, 120, 120] | 2.03 (MBytes) | 0.35 (MBytes) | 0.55ms | 1818.18 (fps) |
YOLO-v10s (.onnx) | 29.2 MB | [1, 3, 640, 640] | 3258.21 (MBytes) | 3219.47 (MBytes) | 6513.48ms | 0.15 (fps) |
PS: If you want to refer to more detailed performance data about YOLOV8, please refer here.
...