Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The NPU operates at clock rates of 900 MHz, delivering computing performance of up to 4.1 TOPS (Trillion Operations Per Second). Optimized for AI models based on convolutional neural networks, it includes a Parallel Processing Unit (PPU) with 32-bit floating-point pipelining and threading.

...

Here is our test results:

Model

Model Size

Input shape [n, c, h, w]

Total (DDR) read BW

Total (DDR) write BW

Average inference time

Frame rate without other latency

AlexNet (.onnx)

233 MB

[1, 3, 224, 224]

47.03 (MBytes)

1.31 (MBytes)

8.41ms

118.91 (fps)

Inception-v1 (.onnx)

27 MB

[1, 3, 224, 224]

16.7 (MBytes)

5.24 (MBytes)

3.97ms

251.89 (fps)

Inception-v2 (.onnx)

43 MB

[1, 3, 224, 224]

14.47 (MBytes)

1.84 (MBytes)

7.68ms

130.21 (fps)

MobileNet-v2 (.onnx)

14 MB

[1, 3, 224, 224]

5.25 (MBytes)

1.24 (MBytes)

1.94ms

515.46 (fps)

EfficientNet-Lite4 (.onnx)

50 MB

[1, 3, 224, 224]

15.69 (MBytes)

4.68 (MBytes)

5.00ms

200.00 (fps)

ResNet-50 (.onnx)

98 MB

[1, 3, 224, 224]

39.61 (MBytes)

13.28 (MBytes)

16.29ms

61.39 (fps)

SqueezeNet (.onnx)

4.8 MB

[1, 3, 224, 224]

2.33 (MBytes)

0.37 (MBytes)

1.29ms

775.19 (fps)

VGG-16 (.onnx)

528 MB

[1, 3, 224, 224]

121.06 (MBytes)

6.97 (MBytes)

22.26ms

44.92 (fps)

DenseNet-121 (.onnx)

32 MB

[1, 3, 224, 224]

26.55 (MBytes)

8.86 (MBytes)

21.12ms

47.35 (fps)

GoogleNet (.onnx)

27 MB

[1, 3, 224, 224]

15.02 (MBytes)

4.89 (MBytes)

3.64ms

274.73 (fps)

CaffeNet (.onnx)

233 MB

[1, 3, 224, 224]

46.13 (MBytes)

0.37 (MBytes)

7.09ms

141.04 (fps)

ShuffleNet-v2 (.onnx)

8.8 MB

[1, 3, 224, 224]

4.14 (MBytes)

1.93 (MBytes)

2.09ms

478.47 (fps)

SSD-MobilenetV1 (.tflite)

26.2 MB

[1, 320, 320, 3]

11.34 (MBytes)

5.21 (MBytes)

5.97ms

167.50 (fps)

SSD-MobilenetV2 (.tflite)

17.1 MB

[1, 320, 320, 3]

12.21 (MBytes)

6.04 (MBytes)

5.17ms

193.42 (fps)

YOLO-v2 (.onnx)

203.9 MB

[1, 3, 416, 416]

47.16 (MBytes)

6.70 (MBytes)

11.50ms

86.96 (fps)

YOLO-v5s (.onnx)

27.9 MB

[1, 3, 640, 640]

87.91 (MBytes)

46.65 (MBytes)

43.64ms

22.91 (fps)

YOLO-v5s-seg (.onnx)

29.4 MB

[1, 3, 640, 640]

130.79 (MBytes)

78.22 (MBytes)

58.46ms

17.11 (fps)

YOLO-v8s-seg (.onnx)

45 MB

[1, 3, 640, 640]

163.19 (MBytes)

101.29 (MBytes)

64.45ms

15.52 (fps)

ArcFace (.onnx)

248.9 MB

[1, 3, 112, 112]

46.19 (MBytes)

5.32 (MBytes)

17.37ms

57.57 (fps)

DeepLab-v3p (.onnx)

22.1 MB

[1, 3, 640, 640]

385.65 (MBytes)

129.15 (MBytes)

107.76ms

9.28 (fps)

3DDFA (.onnx)

12.4 MB

[1, 3, 120, 120]

2.03 (MBytes)

0.35 (MBytes)

0.55ms

1818.18 (fps)

YOLO-v10n (.onnx)

9.39 MB

[1, 3, 640, 640]

3204.12 (MBytes)

3186.14 (MBytes)

6477.36ms

0.15 (fps)

YOLO-v10s (.onnx)

29.2 MB

[1, 3, 640, 640]

3258.21 (MBytes)

3219.47 (MBytes)

6513.48ms

0.15 (fps)

YOLO-v10n - postprocess

9.39 MB - postprocess

[1, 3, 640, 640]

46.92 (MBytes)

33.88 (MBytes)

36.31ms

27.54 (fps)

YOLO-v10s - postprocess

29.2 MB - postprocess

[1, 3, 640, 640]

102.23 (MBytes)

68.58 (MBytes)

68.81ms

14.53 (fps)

Note:

  1. “xxx - postprocess” means removing post-processing (--outputs set to '/model.23/Transpose_output_0').

  2. If you want to refer to more detailed performance data about YOLOV8, please refer here.