Hey CanMV Community,
I'm working on a face detection pipeline using the K230 (hardware v1.1) with CanMV MicroPython and encountering performance bottlenecks during frame preprocessing.
The issue arises when handling the input image, which is captured at a resolution of 1920x1080 from my camera hardware. Before feeding it into the model, I resize the image while maintaining the aspect ratio. However, after resizing, the model's expected input size does not match the resized image, so I apply padding as needed. The problem is that the hardware stops executing the code during the padding process.
To work around this, I created a custom resizing function instead of using the ai2d module, as we don't have access to its internal workings.
Steps to Reproduce:
- Set up a face detection pipeline on the K230 with CanMV Micropython, using a YuNet model for detection.
- Using the following functions (see code below) to pre-process the input image to pass into the face detection model.
- Run the pipeline with rgb888p_size = [1920, 1080] and display_size = [1920, 1080] on an HDMI display, processing live video frame by frame.
- Observe the CanMV terminal and display output as we run the code.
Software and Hardware Version Information:
Hardware: K230 development board, version 1.1.
Software: CanMV Micropython (latest version as of March 17, 2025), YuNet model (k230_face_detection_yunet.kmodel).'
Code:
def resize_nearest(self, img, target_size):
"""Resizes a CHW image using nearest-neighbor interpolation."""
c, h_old, w_old = img.shape # Extract Channels, Height, Width
h_target, w_target = target_size
# Compute scaling factors
factor_0 = h_target / h_old
factor_1 = w_target / w_old
factor = min(factor_0, factor_1) # Maintain aspect ratio
h_new = int(h_old * factor)
w_new = int(w_old * factor)
dsize = (h_new, w_new) # Final resized size
resized_img = np.zeros((c, h_new, w_new))
# Nearest-neighbor interpolation
row_scale = h_old / h_new
col_scale = w_old / w_new
for ch in range(c): # Iterate over each channel
for i in range(h_new):
for j in range(w_new):
orig_y = int(i * row_scale)
orig_x = int(j * col_scale)
resized_img[ch, i, j] = img[ch, orig_y, orig_x]
return resized_img
def pad_image(self, img, target_size):
c, h_old, w_old = img.shape
h_target, w_target = target_size
# Compute padding sizes
diff_0 = h_target - h_old # Difference in height
diff_1 = w_target - w_old # Difference in width
# Padding values (top, bottom) and (left, right)
pad_top = diff_0 // 2
pad_bottom = diff_0 - pad_top
pad_left = diff_1 // 2
pad_right = diff_1 - pad_left
# Create a new zero-filled array of the target size
padded_img = np.zeros((c, h_target, w_target))
# Copy original image into the center of the new padded image
for ch in range(c): # Loop over each channel
for i in range(h_old):
for j in range(w_old):
padded_img[ch, i + pad_top, j + pad_left] = img[ch, i, j]
return padded_img