If PPU is not used to accelerate the vsi_nn_Float32ToDtype()
process, the following CPU methods can be used to accelerate the process.
As the VSI generated code, for example: yolov5s_uint8_nbg_unify, we need to do some modification for the target.
This document is for both uint8 and int16 format.
Steps
Add
vnn_PreTableInit
in vnn_pre_process.h
Add
vnn_PreTableInit
and uint8 to dtype table namedu2d
in vnn_pre_process.c for the pre lookup table.
Modify the original
_float32_to_dtype()
as using table lookup instead of direct calling VSI API.
Create u2d table before open image in main.c.
Add omp option in Makefile.
Test result
The following figure shows the measured performance data after we made the modifications as mentioned in the above steps.
uint8
It took 0.05ms to create the table and 4.70ms to convert the table lookup.
int16
It took 0.05ms to create the table and 5.09ms to convert the table lookup.