Skip to end of banner Go to start of banner

How to accelerate the process of float32 to dtype with CPU

Skip to end of metadata

Created by wang.zhou, last modified on Apr 25, 2024

Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

If PPU is not used to accelerate the vsi_nn_Float32ToDtype() process, the following CPU methods can be used to accelerate the process.

As the VSI generated code, for example: yolov5s_uint8_nbg_unify, we need to do some modification for the target.

This document is for both uint8 and int16 format.

Steps

Add vnn_PreTableInit in vnn_pre_process.h

Add vnn_PreTableInit and uint8 to dtype table named u2d in vnn_pre_process.c for the pre lookup table.

Modify the original _float32_to_dtype() as using table lookup instead of direct calling VSI API.

Create u2d table before open image in main.c.

Add omp option in Makefile.

Test result

The following figure shows the measured performance data after we made the modifications as mentioned in the above steps.

uint8

It took 0.05ms to create the table and 4.70ms to convert the table lookup.

int16

It took 0.05ms to create the table and 5.09ms to convert the table lookup.

No labels