The NPU operates at clock rates of up to 900 MHz, delivering computing performance of up to 4.5 1 TOPS (Trillion Operations Per Second). Optimized for AI models based on convolutional neural networks, it includes a Parallel Processing Unit (PPU) with 32-bit floating-point pipelining and threading.
...
Model | Model Size | Input shape [n, c, h, w] | Total (DDR) read BW | Total (DDR) write BW | Average inference time | Frame rate without other latency |
AlexNet (.onnx) | 233 MB | [1, 3, 224, 224] | 47.03 (MBytes) | 1.31 (MBytes) | 8.41ms | 118.91 (fps) |
Inception-v1 (.onnx) | 27 MB | [1, 3, 224, 224] | 16.7 (MBytes) | 5.24 (MBytes) | 3.97ms | 251.89 (fps) |
Inception-v2 (.onnx) | 43 MB | [1, 3, 224, 224] | 14.47 (MBytes) | 1.84 (MBytes) | 7.68ms | 130.21 (fps) |
MobileNet-v2 (.onnx) | 14 MB | [1, 3, 224, 224] | 5.25 (MBytes) | 1.24 (MBytes) | 1.94ms | 515.46 (fps) |
EfficientNet-Lite4 (.onnx) | 50 MB | [1, 3, 224, 224] | 15.69 (MBytes) | 4.68 (MBytes) | 5.00ms | 200.00 (fps) |
ResNet-50 (.onnx) | 98 MB | [1, 3, 224, 224] | 39.61 (MBytes) | 13.28 (MBytes) | 16.29ms | 61.39 (fps) |
SqueezeNet (.onnx) | 4.8 MB | [1, 3, 224, 224] | 2.33 (MBytes) | 0.37 (MBytes) | 1.29ms | 775.19 (fps) |
VGG-16 (.onnx) | 528 MB | [1, 3, 224, 224] | 121.06 (MBytes) | 6.97 (MBytes) | 22.26ms | 44.92 (fps) |
DenseNet-121 (.onnx) | 32 MB | [1, 3, 224, 224] | 26.55 (MBytes) | 8.86 (MBytes) | 21.12ms | 47.35 (fps) |
GoogleNet (.onnx) | 27 MB | [1, 3, 224, 224] | 15.02 (MBytes) | 4.89 (MBytes) | 3.64ms | 274.73 (fps) |
CaffeNet (.onnx) | 233 MB | [1, 3, 224, 224] | 46.13 (MBytes) | 0.37 (MBytes) | 7.09ms | 141.04 (fps) |
ShuffleNet-v2 (.onnx) | 8.8 MB | [1, 3, 224, 224] | 4.14 (MBytes) | 1.93 (MBytes) | 2.09ms | 478.47 (fps) |
SSD-MobilenetV1 (.tflite) | 26.2 MB | [1, 320, 320, 3] | 11.34 (MBytes) | 5.21 (MBytes) | 5.97ms | 167.50 (fps) |
SSD-MobilenetV2 (.tflite) | 17.1 MB | [1, 320, 320, 3] | 12.21 (MBytes) | 6.04 (MBytes) | 5.17ms | 193.42 (fps) |
YOLO-v2 (.onnx) | 203.9 MB | [1, 3, 416, 416] | 47.16 (MBytes) | 6.70 (MBytes) | 11.50ms | 86.96 (fps) |
YOLO-v5s (.onnx) | 27.9 MB | [1, 3, 640, 640] | 87.91 (MBytes) | 46.65 (MBytes) | 43.64ms | 22.91 (fps) |
YOLO-v5s-seg (.onnx) | 29.4 MB | [1, 3, 640, 640] | 130.79 (MBytes) | 78.22 (MBytes) | 58.46ms | 17.11 (fps) |
YOLO-v8s-seg (.onnx) | 45 MB | [1, 3, 640, 640] | 163.19 (MBytes) | 101.29 (MBytes) | 64.45ms | 15.52 (fps) |
ArcFace (.onnx) | 248.9 MB | [1, 3, 112, 112] | 46.19 (MBytes) | 5.32 (MBytes) | 17.37ms | 57.57 (fps) |
DeepLab-v3p (.onnx) | 22.1 MB | [1, 3, 640, 640] | 385.65 (MBytes) | 129.15 (MBytes) | 107.76ms | 9.28 (fps) |
3DDFA (.onnx) | 12.4 MB | [1, 3, 120, 120] | 2.03 (MBytes) | 0.35 (MBytes) | 0.55ms | 1818.18 (fps) |
YOLO-v10n (.onnx) | 9.39 MB | [1, 3, 640, 640] | 3204.12 (MBytes) | 3186.14 (MBytes) | 6477.36ms | 0.15 (fps) |
YOLO-v10s (.onnx) | 29.2 MB | [1, 3, 640, 640] | 3258.21 (MBytes) | 3219.47 (MBytes) | 6513.48ms | 0.15 (fps) |
YOLO-v10n - postprocess | 9.39 MB - postprocess | [1, 3, 640, 640] | 46.92 (MBytes) | 33.88 (MBytes) | 36.31ms | 27.54 (fps) |
...
YOLO-v10s - postprocess | 29.2 MB - postprocess | [1, 3, 640, 640] | 102.23 (MBytes) | 68.58 (MBytes) | 68.81ms | 14.53 (fps) |
Note:
“xxx - postprocess” means removing post-processing (--outputs set to '/model.23/Transpose_output_0').
If you want to refer to more detailed performance data about YOLOV8, please refer here.