Optimizing Resize and Pad Functions for Face Detection on K230 with CanMV MicroPython

Viewed 51

Hey CanMV Community,

I'm working on a face detection pipeline using the K230 (hardware v1.1) with CanMV MicroPython and encountering performance bottlenecks during frame preprocessing.

The issue arises when handling the input image, which is captured at a resolution of 1920x1080 from my camera hardware. Before feeding it into the model, I resize the image while maintaining the aspect ratio. However, after resizing, the model's expected input size does not match the resized image, so I apply padding as needed. The problem is that the hardware stops executing the code during the padding process.

To work around this, I created a custom resizing function instead of using the ai2d module, as we don't have access to its internal workings.

Steps to Reproduce:

  1. Set up a face detection pipeline on the K230 with CanMV Micropython, using a YuNet model for detection.
  2. Using the following functions (see code below) to pre-process the input image to pass into the face detection model.
  3. Run the pipeline with rgb888p_size = [1920, 1080] and display_size = [1920, 1080] on an HDMI display, processing live video frame by frame.
  4. Observe the CanMV terminal and display output as we run the code.

Software and Hardware Version Information:

Hardware: K230 development board, version 1.1.
Software: CanMV Micropython (latest version as of March 17, 2025), YuNet model (k230_face_detection_yunet.kmodel).'

Code:

def resize_nearest(self, img, target_size):
    """Resizes a CHW image using nearest-neighbor interpolation."""
    c, h_old, w_old = img.shape  # Extract Channels, Height, Width
    h_target, w_target = target_size

    # Compute scaling factors
    factor_0 = h_target / h_old
    factor_1 = w_target / w_old
    factor = min(factor_0, factor_1)  # Maintain aspect ratio

    h_new = int(h_old * factor)
    w_new = int(w_old * factor)
    dsize = (h_new, w_new)  # Final resized size

    resized_img = np.zeros((c, h_new, w_new))

    # Nearest-neighbor interpolation
    row_scale = h_old / h_new
    col_scale = w_old / w_new

    for ch in range(c):  # Iterate over each channel
        for i in range(h_new):
            for j in range(w_new):
                orig_y = int(i * row_scale)
                orig_x = int(j * col_scale)
                resized_img[ch, i, j] = img[ch, orig_y, orig_x]

    return resized_img
def pad_image(self, img, target_size):
    c, h_old, w_old = img.shape
    h_target, w_target = target_size

    # Compute padding sizes
    diff_0 = h_target - h_old  # Difference in height
    diff_1 = w_target - w_old  # Difference in width

    # Padding values (top, bottom) and (left, right)
    pad_top = diff_0 // 2
    pad_bottom = diff_0 - pad_top
    pad_left = diff_1 // 2
    pad_right = diff_1 - pad_left

    # Create a new zero-filled array of the target size
    padded_img = np.zeros((c, h_target, w_target))

    # Copy original image into the center of the new padded image
    for ch in range(c):  # Loop over each channel
        for i in range(h_old):
            for j in range(w_old):
                padded_img[ch, i + pad_top, j + pad_left] = img[ch, i, j]

    return padded_img
1 Answers

Hello user, you can use ai2d for preprocessing, which can significantly improve the execution efficiency of the code. If you want to use ai2d to perform padding and resize operations, you can refer to the following code:

from libs.PipeLine import PipeLine, ScopedTiming
from libs.AI2D import Ai2d
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import os,sys,gc
import image

if __name__ == "__main__":
    # Display mode, default is "hdmi", can be set to "hdmi" or "lcd"
    display_mode = "hdmi"
    if display_mode == "hdmi":
        display_size = [1920, 1080]
    else:
        display_size = [800, 480]

    # Initialize PipeLine for image processing workflow
    pl = PipeLine(rgb888p_size=[512, 512], display_size=display_size, display_mode=display_mode)
    pl.create()  # Create PipeLine instance

    my_ai2d = Ai2d(debug_mode=0)  # Initialize Ai2d instance
    my_ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)

    # Configure pad preprocessing method
    my_ai2d.pad(paddings=[0, 0, 0, 0, 15, 15, 30, 30], pad_mode=0, pad_val=[114, 114, 114])

    # Configure resize preprocessing
    my_ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)

    # Build preprocessing pipeline
    my_ai2d.build([1, 3, 512, 512], [1, 3, 320, 320])

    while True:
        os.exitpoint()
        with ScopedTiming("total", 1):
            img = pl.get_frame()  # Get current frame data
            print(img.shape)      # Original image shape: [1, 3, 512, 512]

            # Execute pad preprocessing: pad 15px on top and bottom (H), 30px on left and right (W), then resize to 320x320
            ai2d_output_tensor = my_ai2d.run(img)

            ai2d_output_np = ai2d_output_tensor.to_numpy()  # Convert to numpy array
            print(ai2d_output_np.shape)  # Shape after preprocessing: [1, 3, 320, 320]

            # Use transpose to convert to HWC layout, then create Image instance in RGB888 format for IDE display
            shape = ai2d_output_np.shape
            ai2d_output_tmp = ai2d_output_np.reshape((shape[0] * shape[1], shape[2] * shape[3]))
            ai2d_output_tmp_trans = ai2d_output_tmp.transpose()
            ai2d_output_hwc = ai2d_output_tmp_trans.copy().reshape((shape[2], shape[3], shape[1]))

            out_img = image.Image(320, 320, image.RGB888, alloc=image.ALLOC_REF, data=ai2d_output_hwc)
            out_img.compress_for_ide()

            gc.collect()  # Garbage collection

    pl.destroy()  # Destroy PipeLine instance