Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The NPU operates at clock rates of up to 900 MHz, delivering computing performance of up to 4.5 1 TOPS (Trillion Operations Per Second). Optimized for AI models based on convolutional neural networks, it includes a Parallel Processing Unit (PPU) with 32-bit floating-point pipelining and threading.

...

Model

Model Size

Input shape [n, c, h, w]

Total (DDR) read BW

Total (DDR) write BW

Average inference time

Frame rate without other latency

AlexNet (.onnx)

233 MB

[1, 3, 224, 224]

47.03 (MBytes)

1.31 (MBytes)

8.41ms

118.91 (fps)

Inception-v1 (.onnx)

27 MB

[1, 3, 224, 224]

16.7 (MBytes)

5.24 (MBytes)

3.97ms

251.89 (fps)

Inception-v2 (.onnx)

43 MB

[1, 3, 224, 224]

14.47 (MBytes)

1.84 (MBytes)

7.68ms

130.21 (fps)

MobileNet-v2 (.onnx)

14 MB

[1, 3, 224, 224]

5.25 (MBytes)

1.24 (MBytes)

1.94ms

515.46 (fps)

EfficientNet-Lite4 (.onnx)

50 MB

[1, 3, 224, 224]

15.69 (MBytes)

4.68 (MBytes)

5.00ms

200.00 (fps)

ResNet-50 (.onnx)

98 MB

[1, 3, 224, 224]

39.61 (MBytes)

13.28 (MBytes)

16.29ms

61.39 (fps)

SqueezeNet (.onnx)

4.8 MB

[1, 3, 224, 224]

2.33 (MBytes)

0.37 (MBytes)

1.29ms

775.19 (fps)

VGG-16 (.onnx)

528 MB

[1, 3, 224, 224]

121.06 (MBytes)

6.97 (MBytes)

22.26ms

44.92 (fps)

DenseNet-121 (.onnx)

32 MB

[1, 3, 224, 224]

26.55 (MBytes)

8.86 (MBytes)

21.12ms

47.35 (fps)

GoogleNet (.onnx)

27 MB

[1, 3, 224, 224]

15.02 (MBytes)

4.89 (MBytes)

3.64ms

274.73 (fps)

CaffeNet (.onnx)

233 MB

[1, 3, 224, 224]

46.13 (MBytes)

0.37 (MBytes)

7.09ms

141.04 (fps)

ShuffleNet-v2 (.onnx)

8.8 MB

[1, 3, 224, 224]

4.14 (MBytes)

1.93 (MBytes)

2.09ms

478.47 (fps)

SSD-MobilenetV1 (.tflite)

26.2 MB

[1, 320, 320, 3]

11.34 (MBytes)

5.21 (MBytes)

5.97ms

167.50 (fps)

SSD-MobilenetV2 (.tflite)

17.1 MB

[1, 320, 320, 3]

12.21 (MBytes)

6.04 (MBytes)

5.17ms

193.42 (fps)

YOLO-v2 (.onnx)

203.9 MB

[1, 3, 416, 416]

47.16 (MBytes)

6.70 (MBytes)

11.50ms

86.96 (fps)

YOLO-v5s (.onnx)

27.9 MB

[1, 3, 640, 640]

87.91 (MBytes)

46.65 (MBytes)

43.64ms

22.91 (fps)

YOLO-v5s-seg (.onnx)

29.4 MB

[1, 3, 640, 640]

130.79 (MBytes)

78.22 (MBytes)

58.46ms

17.11 (fps)

YOLO-v8s-seg (.onnx)

45 MB

[1, 3, 640, 640]

163.19 (MBytes)

101.29 (MBytes)

64.45ms

15.52 (fps)

ArcFace (.onnx)

248.9 MB

[1, 3, 112, 112]

46.19 (MBytes)

5.32 (MBytes)

17.37ms

57.57 (fps)

DeepLab-v3p (.onnx)

22.1 MB

[1, 3, 640, 640]

385.65 (MBytes)

129.15 (MBytes)

107.76ms

9.28 (fps)

3DDFA (.onnx)

12.4 MB

[1, 3, 120, 120]

2.03 (MBytes)

0.35 (MBytes)

0.55ms

1818.18 (fps)

YOLO-v10n (.onnx)

9.39 MB

[1, 3, 640, 640]

3204.12 (MBytes)

3186.14 (MBytes)

6477.36ms

0.15 (fps)

YOLO-v10s (.onnx)

29.2 MB

[1, 3, 640, 640]

3258.21 (MBytes)

3219.47 (MBytes)

6513.48ms

0.15 (fps)

YOLO-v10n - postprocess

9.39 MB - postprocess

[1, 3, 640, 640]

46.92 (MBytes)

33.88 (MBytes)

36.31ms

27.54 (fps)

...

YOLO-v10s - postprocess

29.2 MB - postprocess

[1, 3, 640, 640]

102.23 (MBytes)

68.58 (MBytes)

68.81ms

14.53 (fps)

Note:

  1. “xxx - postprocess” means removing post-processing (--outputs set to '/model.23/Transpose_output_0').

  2. If you want to refer to more detailed performance data about YOLOV8, please refer here.