The NPU operates at clock rates of 900 MHz, delivering computing performance of up to 4.1 TOPS (Trillion Operations Per Second). Optimized for AI models based on convolutional neural networks, it includes a Parallel Processing Unit (PPU) with 32-bit floating-point pipelining and threading.
...
Here is our test results:
Model | Model Size | Input shape [n, c, h, w] | Total (DDR) read BW | Total (DDR) write BW | Average inference time | Frame rate without other latency |
AlexNet (.onnx) | 233 MB | [1, 3, 224, 224] | 47.03 (MBytes) | 1.31 (MBytes) | 8.41ms | 118.91 (fps) |
Inception-v1 (.onnx) | 27 MB | [1, 3, 224, 224] | 16.7 (MBytes) | 5.24 (MBytes) | 3.97ms | 251.89 (fps) |
Inception-v2 (.onnx) | 43 MB | [1, 3, 224, 224] | 14.47 (MBytes) | 1.84 (MBytes) | 7.68ms | 130.21 (fps) |
MobileNet-v2 (.onnx) | 14 MB | [1, 3, 224, 224] | 5.25 (MBytes) | 1.24 (MBytes) | 1.94ms | 515.46 (fps) |
EfficientNet-Lite4 (.onnx) | 50 MB | [1, 3, 224, 224] | 15.69 (MBytes) | 4.68 (MBytes) | 5.00ms | 200.00 (fps) |
ResNet-50 (.onnx) | 98 MB | [1, 3, 224, 224] | 39.61 (MBytes) | 13.28 (MBytes) | 16.29ms | 61.39 (fps) |
SqueezeNet (.onnx) | 4.8 MB | [1, 3, 224, 224] | 2.33 (MBytes) | 0.37 (MBytes) | 1.29ms | 775.19 (fps) |
VGG-16 (.onnx) | 528 MB | [1, 3, 224, 224] | 121.06 (MBytes) | 6.97 (MBytes) | 22.26ms | 44.92 (fps) |
DenseNet-121 (.onnx) | 32 MB | [1, 3, 224, 224] | 26.55 (MBytes) | 8.86 (MBytes) | 21.12ms | 47.35 (fps) |
GoogleNet (.onnx) | 27 MB | [1, 3, 224, 224] | 15.02 (MBytes) | 4.89 (MBytes) | 3.64ms | 274.73 (fps) |
CaffeNet (.onnx) | 233 MB | [1, 3, 224, 224] | 46.13 (MBytes) | 0.37 (MBytes) | 7.09ms | 141.04 (fps) |
ShuffleNet-v2 (.onnx) | 8.8 MB | [1, 3, 224, 224] | 4.14 (MBytes) | 1.93 (MBytes) | 2.09ms | 478.47 (fps) |
SSD-MobilenetV1 (.tflite) | 26.2 MB | [1, 320, 320, 3] | 11.34 (MBytes) | 5.21 (MBytes) | 5.97ms | 167.50 (fps) |
SSD-MobilenetV2 (.tflite) | 17.1 MB | [1, 320, 320, 3] | 12.21 (MBytes) | 6.04 (MBytes) | 5.17ms | 193.42 (fps) |
YOLO-v2 (.onnx) | 203.9 MB | [1, 3, 416, 416] | 47.16 (MBytes) | 6.70 (MBytes) | 11.50ms | 86.96 (fps) |
YOLO-v5s (.onnx) | 27.9 MB | [1, 3, 640, 640] | 87.91 (MBytes) | 46.65 (MBytes) | 43.64ms | 22.91 (fps) |
YOLO-v5s-seg (.onnx) | 29.4 MB | [1, 3, 640, 640] | 130.79 (MBytes) | 78.22 (MBytes) | 58.46ms | 17.11 (fps) |
YOLO-v8s-seg (.onnx) | 45 MB | [1, 3, 640, 640] | 163.19 (MBytes) | 101.29 (MBytes) | 64.45ms | 15.52 (fps) |
ArcFace (.onnx) | 248.9 MB | [1, 3, 112, 112] | 46.19 (MBytes) | 5.32 (MBytes) | 17.37ms | 57.57 (fps) |
DeepLab-v3p (.onnx) | 22.1 MB | [1, 3, 640, 640] | 385.65 (MBytes) | 129.15 (MBytes) | 107.76ms | 9.28 (fps) |
3DDFA (.onnx) | 12.4 MB | [1, 3, 120, 120] | 2.03 (MBytes) | 0.35 (MBytes) | 0.55ms | 1818.18 (fps) |
YOLO-v10n (.onnx) | 9.39 MB | [1, 3, 640, 640] | 3204.12 (MBytes) | 3186.14 (MBytes) | 6477.36ms | 0.15 (fps) |
YOLO-v10s (.onnx) | 29.2 MB | [1, 3, 640, 640] | 3258.21 (MBytes) | 3219.47 (MBytes) | 6513.48ms | 0.15 (fps) |
YOLO-v10n - postprocess | 9.39 MB - postprocess | [1, 3, 640, 640] | 46.92 (MBytes) | 33.88 (MBytes) | 36.31ms | 27.54 (fps) |
YOLO-v10s - postprocess | 29.2 MB - postprocess | [1, 3, 640, 640] | 102.23 (MBytes) | 68.58 (MBytes) | 68.81ms | 14.53 (fps) |
Note:
“xxx - postprocess” means removing post-processing (--outputs set to '/model.23/Transpose_output_0').
If you want to refer to more detailed performance data about YOLOV8, please refer here.