Page Comparison

The NPU operates at clock rates of up to 900 MHz, delivering computing performance of up to 4.5 1 TOPS (Trillion Operations Per Second). Optimized for AI models based on convolutional neural networks, it includes a Parallel Processing Unit (PPU) with 32-bit floating-point pipelining and threading.

...

C3V CPU uses Quad-CA55@1.5GHz with 4GB DRAM.
C3V NPU uses VIP9000@900MHz with 128MB reserved memory.
NN Tools: ACUITY v6.30.x
NPU Kernel driver: v6.4.18.5
NN Model's quantize type: int8

Here is our are the test results:

Model	Model Size	Input shape [n, c, h, w]	Total (DDR) read BW	Total (DDR) write BW	Average inference time	Frame rate without other latency
AlexNet (.onnx)	233 MB	[1, 3, 224, 224]	47.03 (MBytes)	1.31 (MBytes)	8.41ms	118.91 (fps)
Inception-v1 (.onnx)	27 MB	[1, 3, 224, 224]	16.7 (MBytes)	5.24 (MBytes)	3.97ms	251.89 (fps)
Inception-v2 (.onnx)	43 MB	[1, 3, 224, 224]	14.47 (MBytes)	1.84 (MBytes)	7.68ms	130.21 (fps)
MobileNet-v2 (.onnx)	14 MB	[1, 3, 224, 224]	5.25 (MBytes)	1.24 (MBytes)	1.94ms	515.46 (fps)
EfficientNet-Lite4 (.onnx)	50 MB	[1, 3, 224, 224]	15.69 (MBytes)	4.68 (MBytes)	5.00ms	200.00 (fps)
ResNet-50 (.onnx)	98 MB	[1, 3, 224, 224]	39.61 (MBytes)	13.28 (MBytes)	16.29ms	61.39 (fps)
SqueezeNet (.onnx)	4.8 MB	[1, 3, 224, 224]	2.33 (MBytes)	0.37 (MBytes)	1.29ms	775.19 (fps)
VGG-16 (.onnx)	528 MB	[1, 3, 224, 224]	121.06 (MBytes)	6.97 (MBytes)	22.26ms	44.92 (fps)
DenseNet-121 (.onnx)	32 MB	[1, 3, 224, 224]	26.55 (MBytes)	8.86 (MBytes)	21.12ms	47.35 (fps)
GoogleNet (.onnx)	27 MB	[1, 3, 224, 224]	15.02 (MBytes)	4.89 (MBytes)	3.64ms	274.73 (fps)
CaffeNet (.onnx)	233 MB	[1, 3, 224, 224]	46.13 (MBytes)	0.37 (MBytes)	7.09ms	141.04 (fps)
ShuffleNet-v2 (.onnx)	8.8 MB	[1, 3, 224, 224]	4.14 (MBytes)	1.93 (MBytes)	2.09ms	478.47 (fps)
SSD-MobilenetV1 (.tflite)	26.2 MB	[1, 320, 320, 3]	11.34 (MBytes)	5.21 (MBytes)	5.97ms	167.50 (fps)
SSD-MobilenetV2 (.tflite)	17.1 MB	[1, 320, 320, 3]	12.21 (MBytes)	6.04 (MBytes)	5.17ms	193.42 (fps)
YOLO-v2 (.onnx)	203.9 MB	[1, 3, 416, 416]	47.16 (MBytes)	6.70 (MBytes)	11.50ms	86.96 (fps)
YOLO-v5s (.onnx)	27.9 MB	[1, 3, 640, 640]	87.91 (MBytes)	46.65 (MBytes)	43.64ms	22.91 (fps)
YOLO-v5s-seg (.onnx)	29.4 MB	[1, 3, 640, 640]	130.79 (MBytes)	78.22 (MBytes)	58.46ms	17.11 (fps)
YOLO-v8s-seg (.onnx)	45 MB	[1, 3, 640, 640]	163.19 (MBytes)	101.29 (MBytes)	64.45ms	15.52 (fps)
ArcFace (.onnx)	248.9 MB	[1, 3, 112, 112]	46.19 (MBytes)	5.32 (MBytes)	17.37ms	57.57 (fps)
DeepLab-v3p (.onnx)	22.1 MB	[1, 3, 640, 640]	385.65 (MBytes)	129.15 (MBytes)	107.76ms	9.28 (fps)
3DDFA (.onnx)	12.4 MB	[1, 3, 120, 120]	2.03 (MBytes)	0.35 (MBytes)	0.55ms	1818.18 (fps)
YOLO-v10n (.onnx)	9.39 MB	[1, 3, 640, 640]	3204.12 (MBytes)	3186.14 (MBytes)	6477.36ms	0.15 (fps)
YOLO-v10s (.onnx)	29.2 MB	[1, 3, 640, 640]	3258.21 (MBytes)	3219.47 (MBytes)	6513.48ms	0.15 (fps)
YOLO-v10n - postprocess	9.39 MB - postprocess	[1, 3, 640, 640]	46.92 (MBytes)	33.88 (MBytes)	36.31ms	27.54 (fps)

...


YOLO-v10s - postprocess	29.2 MB - postprocess	[1, 3, 640, 640]	102.23 (MBytes)	68.58 (MBytes)	68.81ms	14.53 (fps)

Note:

“xxx - postprocess” means removing post-processing (--outputs set to '/model.23/Transpose_output_0').
If you want to refer to more detailed performance data about YOLOV8, please refer here.

Versions Compared

Old Version 6

New Version Current

Key