Remapping Video Buffer for CPU Processing

Video buffers de-queued from MIPI-RX of SP7350 are initially uncached memory, intended for further processing by hardware engines such as ISP, video codec, and etc. However, uncached memory is unsuitable for direct CPU processing. This document illustrates how to configure MIPI-RX to remap video buffers to cached memory before they are de-queued, facilitating efficient CPU processing.

Table of Contents

1. Operation of MIPI-RX on SP7350

MIPI-RX on SP7350 uses the DMA contiguous (videobuf2-dma-contig.c) model for video buffers, supporting only physically contiguous memory. In this model, video buffers are allocated using the dma_alloc_attrs() function, ensuring they are virtually and physically contiguous. This is essential for the MIPI-RX DMA engine, which requires contiguous memory to store an image frame.

When video buffers are sent to another hardware engine (e.g., ISP, video codec, and etc.), the subsequent hardware accesses them using DMA in a multi-word method. The data flow is depicted as follows:

image-20240522-103812.png

2. CPU Accesses to Uncached Video Buffer

DMA coherent memory, being uncached, avoids consistency issues. However, CPU access to uncached video buffers is inefficient, as every read or write operation is performed word by word directly to memory. This inefficiency is illustrated below:

image-20240522-103842.png

In this scenario, the MIPI-RX stores the received image in a video buffer, which is uncached memory. The CPU then reads from this uncached memory, processes the data, and writes the processed data back to the same uncached memory. This process is inherently slow due to the direct memory accesses required for each read and write operation.

3. Benefits of Remapping Video Buffers to Cached Memory

To enhance efficiency, the MIPI-RX on SP7350 can remap video buffers to cached memory before de-queuing them to user space applications. Cached memory allows the CPU to read a cache line of data (128 bytes for Cortex A55 CPU) from video buffer memory, making successive reads from the cache. Similarly, CPU writes initially go to the cache, with the cache write-back logic eventually writing the whole cache line to memory. This efficiency is illustrated below:

4. Using sysfs to Enable Video Buffer Remapping

To enable the remapping of video buffers to cached memory, execute the following command:

echo 1 > /sys/module/videobuf2_dma_contig/parameters/remap

To disable this function, use the command:

echo 0 > /sys/module/videobuf2_dma_contig/parameters/remap

By default, the remap function is disabled.

5. Test Report

5.1 Test UYVY 1920*1080 Format Video Image Using OV5640 Camera

Set the frame rate to 30 fps (data rate: 124.416MB/s) and store the video image in the /tmp directory (tmpfs type) using:

v4l2-ctl -d /dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=UYVY --set-parm=30 --stream-mmap=10 --stream-to=/tmp/OV5640_UYVY_1080P_001.raw --stream-skip=3 --stream-count=180

Results:

 Remap

CPU Utilization (%)

Frame Rate (fps)

 Remap

CPU Utilization (%)

Frame Rate (fps)

Disable

100

15

Enable

32

30

Performance improvement: 100/32×30/15 ​= 6.25 times

5.2 Test UYVY 1280*720 Format Video Image Using OV5640 Camera

Set the frame rate to 30 fps (data rate: 55.296MB/s) and store the video image in the /tmp directory (tmpfs type) using:

Results:

 Remap

CPU Utilization (%)

Frame Rate (fps)

 Remap

CPU Utilization (%)

Frame Rate (fps)

Disable

88.1

30

Enable

14.6

30

Performance improvement: 88.1/14.6 = 6.03 times

5.3 Test UYVY 640*480 Format Video Image Using OV5640 Camera

Set the frame rate to 30 fps (data rate: 18.432MB/s) and store the video image in the /tmp directory (tmpfs type) using:

Results:

 Remap

CPU Utilization (%)

Frame Rate (fps)

 Remap

CPU Utilization (%)

Frame Rate (fps)

Disable

34.7

30

Enable

5.2

30

Performance improvement: 34.7/5.2 = 6.67 times