Scenario introduction

When we use camera streaming to record (display) and CNN at the same time, usually what we get from the camera in the system is YUV streaming. CNN requires BGR format. When we use OpenCV to scale and convert the FHD YUV image The BGR format of 320*180 usually takes more than 10ms, which is already relatively long for the scene where we hope to achieve 30FPS. At this time, we will hope for a more efficient way.

OpenCV Code

Below is an example using OpenCV that converts a YUYV image to BGR format and scales it down to half of its original size:

/* cvt color & scale down to width / 2, height / 2 */
if (FILE_FORMAT == YUV_FMT_YUYV) {
    cv::cvtColor(yuvImage, bgrImage, cv::COLOR_YUV2BGR_YUYV);
}
else {
    cv::cvtColor(yuvImage, bgrImage, cv::COLOR_YUV2BGR_UYVY);
}
cv::resize(bgrImage, resizedBgrImage, cv::Size(FILE_WIDTH / 2, FILE_HEIGHT / 2), 0, 0, cv::INTER_LINEAR);

Performance comparison(1080P)

We named the efficient YUV -> BGR resize & convert method `YUV Converter`

We use a BGR format that converts YUV from 1080P to 320*180 as an example. The measured efficiency comparison is as follows:

The "OpenCV" row represents the use of OpenCV entirely for format conversion and scaling operations.
The "YUV Converter" row represents the use of the interface provided in this example entirely for format conversion and scaling operations.

YUV Converter Introduction

YUV Conveter is a sample code for YUV to BGR conversion and scaling specifically for the C3V platform. Its key features are as follows:

Supports YUV to BGR conversion.
Allows scaling the image size to a specified ratio while converting from YUV to BGR.
Supports YUYV and UYVY formats, which are commonly used on the C3V platform.
Utilizes ARM NEON acceleration, requiring only the inclusion of relevant .c and .h files, without the need for installing a bulky OpenCV library.
The efficiency of conversion plus scaling is higher than OpenCV. For example, using OpenCV to convert a 1080P YUV image to BGR and scale it to half the original size takes approximately 14ms, while the sample code can complete the same task in about 3ms.

How to use

Introduction to Core Files

sunplus@ubuntu:~/workspace/neon/optimize_samples$ ls -l sources/converter/
total 44
-rw-rw-r-- 1 sunplus sunplus  9253 May 15 03:32 YUVConverter.c
-rw-rw-r-- 1 sunplus sunplus   732 May 15 03:32 YUVConverter.h
-rw-rw-r-- 1 sunplus sunplus 13852 May 15 03:25 YUVConverterScale.c
-rw-rw-r-- 1 sunplus sunplus  1077 May 15 03:25 YUVConverterScale.h
-rw-rw-r-- 1 sunplus sunplus   492 May 15 03:25 YUVConverterTypes.h

YUVConverter.c/h: This file converts YUV images to BGR format without changing their size.
YUVConverterScale.c/h: This file converts YUV images to BGR format while scaling them down to a specified ratio.
YUVConverterTypes.h: This header file defines the supported YUV formats. The example code only supports two commonly used formats on c3v: YUYV and UYVY.

Introduction to Test Files

sunplus@ubuntu:~/workspace/neon/optimize_samples$ ls sources/examples/ -l
total 32
-rw-rw-r-- 1 sunplus sunplus 2303 May 15 05:24 converter_cv.cpp
-rw-rw-r-- 1 sunplus sunplus 4643 May 15 05:22 main.cpp
-rw-rw-r-- 1 sunplus sunplus 4875 May 15 03:25 MainTestRunner.cpp
-rw-rw-r-- 1 sunplus sunplus 1197 May 15 03:25 MainTestRunner.h
-rw-rw-r-- 1 sunplus sunplus 1359 May 15 03:25 MainTestUtil.cpp
-rw-rw-r-- 1 sunplus sunplus  612 May 15 03:25 MainTestUtil.h

converter_cv.cpp: An example of conversion and scaling based on OpenCV.
main.cpp, MainTestRunner.cpp/h, MainTestUtil.cpp: Test files for the interfaces in YUVConverter.h and YUVConverterScale.h.

sunplus@ubuntu:~/workspace/neon/optimize_samples$ ls -al
total 5900
-rw-rw-r-- 1 sunplus sunplus 4147200 May 13 02:56 FHD_face.yuv
-rwxrwxr-x 1 sunplus sunplus      32 May 11 11:30 .gitignore
-rwxrwxr-x 1 sunplus sunplus    1806 May 15 05:55 makefile
drwxrwxr-x 4 sunplus sunplus    4096 May 15 05:47 sources
-rwxrwxr-x 1 sunplus sunplus 1843200 May 11 11:30 yuv2.uyvy

FHD_face.yuv: A 1080P YUV image in the YUYV format.
yuv2.uyvy: A 720P image in the UYVY format.

As mentioned above, the interface provides test code, test files, and a makefile for reference, which can be used to compile and run the code.

If you are performing cross-compilation, please use the gcc version arm64-9.2 or arm64-10.2 (none-linux-gnu). The download links are available on the ARM official website. Additionally, if you do not have an OpenCV library for arm64, you may need to make some modifications to the makefile.

As shown in the figure, there are two things that need to be done: 1. Disable OpenCV, 2. Specify the cross-compiler path.

After executing the make command in the command line and completing the compilation, the generated binary file will be located under _out/bins.

sunplus@ubuntu:~/workspace/neon/optimize_samples$ ls _out/bins/ -l
total 324
-rwxrwxr-x 1 sunplus sunplus 128160 May 15 05:55 converter
-rwxrwxr-x 1 sunplus sunplus  75112 May 15 05:55 converter_cv
-rwxrwxr-x 1 sunplus sunplus 119784 May 15 05:55 converter_o3

Execution Example:

sunplus@ubuntu:~/workspace/neon/optimize_samples$ ./_out/bins/converter_o3 neon_scale
testType: 6(neon_scale) vs 5(calc_scale)
convert calc_scale succeed, (1920x1080) -> (960x540) by (2x2) takes 10
convert neon_scale succeed, (1920x1080) -> (960x540) by (2x2) takes 3
sunplus@ubuntu:~/workspace/neon/optimize_samples$ ./_out/bins/converter_cv
The image(1920x1080) convert & scale down(2x) takes: 13

API Introduction

YUVConverter.h

uint32_t yuvToBgrByNeon(
    int yuvFormat, uint32_t width, uint32_t height,
    uint8_t* yuvBuffer, uint8_t* rgbBuffer, uint8_t* rgbBufferInterleaved);

uint32_t yuvToBgrByNorm(
    int yuvFormat, uint32_t width, uint32_t height,
    uint8_t* yuvBuffer, uint8_t* rgbBuffer, uint8_t* rgbBufferInterleaved);

uint32_t yuvToGrayByNeon(
    int yuvFormat, uint32_t width, uint32_t height,
    uint8_t* yuvBuffer, uint8_t* gray8Buffer, uint8_t* gray24BufferInterleaved);

yuvToBgrByNeon: Converts YUV to BGR with the same size, accelerated by NEON.
yuvToBgrByNorm: Converts YUV to BGR with the same size, without NEON acceleration.
yuvToGrayByNeon: Converts YUV to grayscale with the same size, accelerated by NEON.
Parameters and Return Values for the Above Three Functions:
yuvFormat: Specifies the image format, either YUYV or UYVY.
width: The width of the image.
height: The height of the image.
yuvBuffer: The content of the YUV image.
rgbBuffer: The converted RGB image where RGB channel data is stored separately. The format in memory is as follows:

bbbbb.....bbbb <-- width * height (blue channel)
ggggg.....gggg <-- width * height (green channel)
rrrrrrrr......rrrrrrr <-- width * height (red channel)

rgbBufferInterleaved: The converted RGB image where RGB pixels are stored interleaved. The format in memory is as follows:

bgrbgrbgr.....bgrbgrbgr <-- width * height * 3 (interleaved BGR pixels)

Return Value: The size of the converted image. If the conversion fails, it returns 0.

YUVConverterScale.h

uint32_t yuvToBgrByNeonScale(
    int yuvFormat, uint32_t width, uint32_t height, uint32_t scaleFactor,
    uint8_t* yuvBuffer, uint8_t* rgbBuffer, uint8_t* rgbBufferInterleaved);

uint32_t yuvToBgrByNeonWHScale(
    int yuvFormat,
    uint32_t width, uint32_t scaleFactorW,
    uint32_t height, uint32_t scaleFactorH,
    uint8_t* yuvBuffer, uint8_t* rgbBuffer, uint8_t* rgbBufferInterleaved);

uint32_t yuvToBgrByNormScale(
    int yuvFormat, uint32_t width, uint32_t height, uint32_t scaleFactor,
    uint8_t* yuvBuffer, uint8_t* rgbBuffer, uint8_t* rgbBufferInterleaved);

uint32_t yuvToBgrByNormWHScale(
    int yuvFormat,
    uint32_t width, uint32_t scaleFactorW,
    uint32_t height, uint32_t scaleFactorH,
    uint8_t* yuvBuffer, uint8_t* rgbBuffer, uint8_t* rgbBufferInterleaved);

yuvToBgrByNeonScale: Converts YUV to BGR and scales the width and height to a specified factor, accelerated by NEON.
yuvToBgrByNeonWHScale: Converts YUV to BGR and allows scaling the width and height to different factors, accelerated by NEON.
yuvToBgrByNormScale: Converts YUV to BGR and scales the width and height to a specified factor, without NEON acceleration.
yuvToBgrByNormWHScale: Converts YUV to BGR and allows scaling the width and height to different factors, without NEON acceleration.
The return values of the above four functions are consistent with those in YUVConverter.h, and the meanings of the parameters with the same names are also the same. The different named parameters are as follows:
- scaleFactor: This parameter determines the scaling factor for both width and height, which can be an integer multiple of 2, such as 2, 4, 6, ... 16, etc.
- scaleFactorW: This parameter determines the scaling factor for the width, which can be an integer multiple of 2, such as 2, 4, 6, ... 16, etc.
- scaleFactorH:This parameter determines the scaling factor for the height, which can be an integer multiple of 2, such as 2, 4, 6, ... 16, etc.

The API usage sample

Please refer to the code of the test function in MainTestRunner.cpp:

uint32_t MainTestRunner::test(int scaleFactorW, int scaleFactorH, bool dumpFile,
                          int yuvFormat, uint32_t width, uint32_t height,
                          uint8_t* yuvBuffer, uint8_t* rgbBuffer, uint8_t* rgbBufferInterleaved)
{
    auto prevNorm = chrono::system_clock::now();

    /* convert  */
    uint32_t outputSize = 0;
    switch(this->testType)
    {
        case MAIN_TEST_CALC_COMM:
            outputSize = yuvToBgrByNorm(yuvFormat, width, height, yuvBuffer, rgbBuffer, rgbBufferInterleaved);
            break;
        case MAIN_TEST_NEON_COMM:
            outputSize = yuvToBgrByNeon(yuvFormat, width, height, yuvBuffer, rgbBuffer, rgbBufferInterleaved);
            break;
        case MAIN_TEST_NEON_GREY:
            outputSize = yuvToGrayByNeon(yuvFormat, width, height, yuvBuffer, rgbBuffer, rgbBufferInterleaved);
            break;

        case MAIN_TEST_CALC_SCALE:
            outputSize = yuvToBgrByNormWHScale(yuvFormat, width, scaleFactorW, height, scaleFactorH, yuvBuffer, rgbBuffer, rgbBufferInterleaved);
            break;
        case MAIN_TEST_NEON_SCALE:
            outputSize = yuvToBgrByNeonWHScale(yuvFormat, width, scaleFactorW, height, scaleFactorH, yuvBuffer, rgbBuffer, rgbBufferInterleaved);
            break;
        default:
            break;
    }

    auto durationNorm = chrono::duration_cast<chrono::milliseconds>(chrono::system_clock::now() - prevNorm);
    printf("convert %s %s, (%ux%u) -> (%ux%u) by (%ux%u) takes %lu\n", 
           this->getType(this->testType).c_str(), outputSize > 0 ? "succeed" : "failed", 
	   width, height, width / scaleFactorW, height / scaleFactorH, scaleFactorW, scaleFactorH, durationNorm.count());

    if (outputSize > 0 && dumpFile)
    {
        if (rgbBufferInterleaved != nullptr)
        {
            //this->saveFile(rgbBufferInterleaved, outputSize, scaleFactor, this->getType(this->testType) + "_interleaved");
            if (testType == MAIN_TEST_CALC_SCALE || testType == MAIN_TEST_NEON_SCALE) {
                width = width / scaleFactorW;
                height = height / scaleFactorH;
            }
            this->saveJpegFile(rgbBufferInterleaved, width, height, scaleFactorW, scaleFactorH, this->getType(this->testType) + "_interleaved");
        }
    }

    return outputSize;
}

The caller code is as follows (located in main.cpp):

/* real test */
if (testType != MAIN_TEST_NEON_SCALE)
{
    printf("testType: %d(%s) vs %d(%s)\n",
           testType, MainTestRunner::getType(testType).c_str(),
           MAIN_TEST_CALC_COMM, MainTestRunner::getType(MAIN_TEST_CALC_COMM).c_str());

    auto testRunner1 = make_shared<MainTestRunner>(MAIN_TEST_CALC_COMM);
    auto testRunner2 = make_shared<MainTestRunner>(testType);

    outputDataSize = testRunner1->test(dumpFile, FRAME_FMT, FRAME_WIDTH, FRAME_HEIGHT, yuvBuffer, nullptr, outputBuffer1);
    outputDataSize = testRunner2->test(dumpFile, FRAME_FMT, FRAME_WIDTH, FRAME_HEIGHT, yuvBuffer, nullptr, outputBuffer2);

    if (outputBuffer1 != nullptr && outputBuffer2 != nullptr) {
        compaire_data(0, outputBuffer1, outputBuffer2, 0, outputDataSize);
    }
}
else
{
    printf("testType: %d(%s) vs %d(%s)\n",
           testType, MainTestRunner::getType(testType).c_str(),
           MAIN_TEST_CALC_SCALE, MainTestRunner::getType(MAIN_TEST_CALC_SCALE).c_str());

    auto testRunner1 = make_shared<MainTestRunner>(MAIN_TEST_CALC_SCALE);
    auto testRunner2 = make_shared<MainTestRunner>(testType);

    auto scaleFactors = vector<int>({2, 4, 6, 8, 10, 12, 14, 16});
    for (auto scaleFactorW : scaleFactors)
    {
        for (auto scaleFactorH : scaleFactors)
        {
            outputDataSize = testRunner1->test(scaleFactorW, scaleFactorH, dumpFile, FRAME_FMT, FRAME_WIDTH, FRAME_HEIGHT, yuvBuffer, nullptr, outputBuffer1);
            outputDataSize = testRunner2->test(scaleFactorW, scaleFactorH, dumpFile, FRAME_FMT, FRAME_WIDTH, FRAME_HEIGHT, yuvBuffer, nullptr, outputBuffer2);

            if (outputBuffer1 != nullptr && outputBuffer2 != nullptr) {
                compaire_data(0, outputBuffer1, outputBuffer2, 0, outputDataSize);
            }
        }
    }
}

Code

Please refer to the attachment for the sample code mentioned above and the implementation code of YUV Converter.

YUV Conversion and Scaling by Neon