This document describes how to use NEON on the SP7350 to accelerate data processing.We will describe what scenarios are suitable for using NEON, NEON learning materials, and examples of how to use NEON acceleration.
...
There are many formats for YUV, but the format used in the example code is limited to YUYV, which has the following format: YUYV YUYV YUYV ... YUYV. It has the following characteristics:
Each pair of Y components shares a pair of UV components, so every two pixels occupy 4 bytes. For a YUYV image with a width of W and a height of H, it occupies W * H bytes.
RGB format
Due to the instructions for decentralized storage loading in neon, there is basically no impact on performance depending on which type of RGB is stored, whether it is RGB, BGA, RGBA, BGRA, etc.
...
Neon does not natively support matrix multiplication.
As shown in the above picture, the first two columns of the output matrix have been calculated using the traditional approach. And he following patterns/rules are identified:
The i-th column of the output matrix is obtained by multiplying the x-th column of matrix A with the x-th element of the i-th column of matrix B, where 0 <= i < 4 and 0 <= x < 4.
Construct vectors in column units, a0(a11, a21, a31, a41)..., b0(b11, b21, b31, b41).... The calculation can be written as follows:
a0 x b0[0] + a1 x b0[1] + a2 x b0[2] + a3 x b0[3] ->The first column of the resulting matrix
a0 x b1[0] + a1 x b1[1] + a2 x b1[2] + a3 x b1[3] ->The second column of the resulting matrix
....
In summary, we can get the following conclusion:
If the matrix is stored in memory in a column-major order, it is suitable for SIMD (Single Instruction, Multiple Data) computation.
If the matrix is stored in row-first order, neon also provides instructions for scatter load/store, making it easier to convert to column-first vector
If the original data is not in column-major order
...