Page Comparison

The NPU operates at clock rates of 900 MHz, delivering computing performance of up to 4.1 TOPS (Trillion Operations Per Second). Optimized for AI models based on convolutional neural networks, it includes a Parallel Processing Unit (PPU) with 32-bit floating-point pipelining and threading.

...

Here is our test results:

Model	Model Size	Input shape [n, c, h, w]	Total (DDR) read BW	Total (DDR) write BW	Average inference time	Frame rate without other latency
AlexNet (.onnx)	233 MB	[1, 3, 224, 224]	47.03 (MBytes)	1.31 (MBytes)	8.41ms	118.91 (fps)
Inception-v1 (.onnx)	27 MB	[1, 3, 224, 224]	16.7 (MBytes)	5.24 (MBytes)	3.97ms	251.89 (fps)
Inception-v2 (.onnx)	43 MB	[1, 3, 224, 224]	14.47 (MBytes)	1.84 (MBytes)	7.68ms	130.21 (fps)
MobileNet-v2 (.onnx)	14 MB	[1, 3, 224, 224]	5.25 (MBytes)	1.24 (MBytes)	1.94ms	515.46 (fps)
EfficientNet-Lite4 (.onnx)	50 MB	[1, 3, 224, 224]	15.69 (MBytes)	4.68 (MBytes)	5.00ms	200.00 (fps)
ResNet-50 (.onnx)	98 MB	[1, 3, 224, 224]	39.61 (MBytes)	13.28 (MBytes)	16.29ms	61.39 (fps)
SqueezeNet (.onnx)	4.8 MB	[1, 3, 224, 224]	2.33 (MBytes)	0.37 (MBytes)	1.29ms	775.19 (fps)
VGG-16 (.onnx)	528 MB	[1, 3, 224, 224]	121.06 (MBytes)	6.97 (MBytes)	22.26ms	44.92 (fps)
DenseNet-121 (.onnx)	32 MB	[1, 3, 224, 224]	26.55 (MBytes)	8.86 (MBytes)	21.12ms	47.35 (fps)
GoogleNet (.onnx)	27 MB	[1, 3, 224, 224]	15.02 (MBytes)	4.89 (MBytes)	3.64ms	274.73 (fps)
CaffeNet (.onnx)	233 MB	[1, 3, 224, 224]	46.13 (MBytes)	0.37 (MBytes)	7.09ms	141.04 (fps)
ShuffleNet-v2 (.onnx)	8.8 MB	[1, 3, 224, 224]	4.14 (MBytes)	1.93 (MBytes)	2.09ms	478.47 (fps)
SSD-MobilenetV1 (.tflite)	26.2 MB	[1, 320, 320, 3]	11.34 (MBytes)	5.21 (MBytes)	5.97ms	167.50 (fps)
SSD-MobilenetV2 (.tflite)	17.1 MB	[1, 320, 320, 3]	12.21 (MBytes)	6.04 (MBytes)	5.17ms	193.42 (fps)
YOLO-v2 (.onnx)	203.9 MB	[1, 3, 416, 416]	47.16 (MBytes)	6.70 (MBytes)	11.50ms	86.96 (fps)
YOLO-v5s (.onnx)	27.9 MB	[1, 3, 640, 640]	87.91 (MBytes)	46.65 (MBytes)	43.64ms	22.91 (fps)
YOLO-v5s-seg (.onnx)	29.4 MB	[1, 3, 640, 640]	130.79 (MBytes)	78.22 (MBytes)	58.46ms	17.11 (fps)
YOLO-v8s-seg (.onnx)	45 MB	[1, 3, 640, 640]	163.19 (MBytes)	101.29 (MBytes)	64.45ms	15.52 (fps)
ArcFace (.onnx)	248.9 MB	[1, 3, 112, 112]	46.19 (MBytes)	5.32 (MBytes)	17.37ms	57.57 (fps)
DeepLab-v3p (.onnx)	22.1 MB	[1, 3, 640, 640]	385.65 (MBytes)	129.15 (MBytes)	107.76ms	9.28 (fps)
3DDFA (.onnx)	12.4 MB	[1, 3, 120, 120]	2.03 (MBytes)	0.35 (MBytes)	0.55ms	1818.18 (fps)
YOLO-v10n (.onnx)	9.39 MB	[1, 3, 640, 640]	3204.12 (MBytes)	3186.14 (MBytes)	6477.36ms	0.15 (fps)
YOLO-v10s (.onnx)	29.2 MB	[1, 3, 640, 640]	3258.21 (MBytes)	3219.47 (MBytes)	6513.48ms	0.15 (fps)
YOLO-v10n - postprocess	9.39 MB - postprocess	[1, 3, 640, 640]	46.92 (MBytes)	33.88 (MBytes)	36.31ms	27.54 (fps)
YOLO-v10s - postprocess	29.2 MB - postprocess	[1, 3, 640, 640]	102.23 (MBytes)	68.58 (MBytes)	68.81ms	14.53 (fps)

Note:

“xxx - postprocess” means removing post-processing (--outputs set to '/model.23/Transpose_output_0').
If you want to refer to more detailed performance data about YOLOV8, please refer here.

Versions Compared

Old Version 7

New Version 8

Key