2

I wish to change the darknet_video.py from https://github.com/AlexeyAB/darknet/blob/master/darknet_video.py into a multi-processing program instead of multi-threading to avoid Python GIL and achieve actual parallelism

However, converting to a multi-processing program is very tricky such that multi-processes Queue requires pickable object and global variables are not shared.

I have been having errors "ValueError: ctypes objects containing pointers cannot be pickled"

I appreciate all help. It has been bothering me for weeks

Code for darknet.py( I believe I have to edit the classes to be pickable but I have no idea how to)

class BOX(Structure):
    _fields_ = [("x", c_float),
                ("y", c_float),
                ("w", c_float),
                ("h", c_float)]


class DETECTION(Structure):
    _fields_ = [("bbox", BOX),
                ("classes", c_int),
                ("best_class_idx", c_int),
                ("prob", POINTER(c_float)),
                ("mask", POINTER(c_float)),
                ("objectness", c_float),
                ("sort_class", c_int),
                ("uc", POINTER(c_float)),
                ("points", c_int),
                ("embeddings", POINTER(c_float)),
                ("embedding_size", c_int),
                ("sim", c_float),
                ("track_id", c_int)]

class DETNUMPAIR(Structure):
    _fields_ = [("num", c_int),
                ("dets", POINTER(DETECTION))]


class IMAGE(Structure):
    _fields_ = [("w", c_int),
                ("h", c_int),
                ("c", c_int),
                ("data", POINTER(c_float))]


class METADATA(Structure):
    _fields_ = [("classes", c_int),
                ("names", POINTER(c_char_p))]

code for darknet_video.py

from ctypes import *
import random
import os
import cv2
import time
import darknet
import argparse
#from threading import Thread, enumerate
from multiprocessing import Process, Queue
#from queue import Queue


def parser():
    parser = argparse.ArgumentParser(description="YOLO Object Detection")
    parser.add_argument("--input", type=str, default=0,
                        help="video source. If empty, uses webcam 0 stream")
    parser.add_argument("--out_filename", type=str, default="",
                        help="inference video name. Not saved if empty")
    parser.add_argument("--weights", default="yolov4.weights",
                        help="yolo weights path")
    parser.add_argument("--dont_show", action='store_false',
                        help="windown inference display. For headless systems")
    parser.add_argument("--ext_output", action='store_true',
                        help="display bbox coordinates of detected objects")
    parser.add_argument("--config_file", default="./cfg/yolov4.cfg",
                        help="path to config file")
    parser.add_argument("--data_file", default="./cfg/coco.data",
                        help="path to data file")
    parser.add_argument("--thresh", type=float, default=.25,
                        help="remove detections with confidence below this value")
    
 
    return parser.parse_args()


def str2int(video_path):
    """
    argparse returns and string althout webcam uses int (0, 1 ...)
    Cast to int if needed
    """
    try:
        return int(video_path)
    except ValueError:
        return video_path


def check_arguments_errors(args):
    assert 0 < args.thresh < 1, "Threshold should be a float between zero and one (non-inclusive)"
    if not os.path.exists(args.config_file):
        raise(ValueError("Invalid config path {}".format(os.path.abspath(args.config_file))))
    if not os.path.exists(args.weights):
        raise(ValueError("Invalid weight path {}".format(os.path.abspath(args.weights))))
    if not os.path.exists(args.data_file):
        raise(ValueError("Invalid data file path {}".format(os.path.abspath(args.data_file))))
    if str2int(args.input) == str and not os.path.exists(args.input):
        raise(ValueError("Invalid video path {}".format(os.path.abspath(args.input))))


def set_saved_video(input_video, output_video, size):
    fourcc = cv2.VideoWriter_fourcc(*"MJPG") #Concat 4 chars to a fourcc code mjpg->video codec
    fps = int(input_video.get(cv2.CAP_PROP_FPS))
    video = cv2.VideoWriter(output_video, fourcc, fps, size)
    return video


def convert2relative(bbox):
    """
    YOLO format use relative coordinates for annotation
    """
    x, y, w, h  = bbox
    _height     = darknet_height
    _width      = darknet_width
    return x/_width, y/_height, w/_width, h/_height


def convert2original(image, bbox):
    x, y, w, h = convert2relative(bbox)

    image_h, image_w, __ = image.shape

    orig_x       = int(x * image_w)
    orig_y       = int(y * image_h)
    orig_width   = int(w * image_w)
    orig_height  = int(h * image_h)

    bbox_converted = (orig_x, orig_y, orig_width, orig_height)

    return bbox_converted


def convert4cropping(image, bbox):
    x, y, w, h = convert2relative(bbox)

    image_h, image_w, __ = image.shape

    orig_left    = int((x - w / 2.) * image_w)
    orig_right   = int((x + w / 2.) * image_w)
    orig_top     = int((y - h / 2.) * image_h)
    orig_bottom  = int((y + h / 2.) * image_h)

    if (orig_left < 0): orig_left = 0
    if (orig_right > image_w - 1): orig_right = image_w - 1
    if (orig_top < 0): orig_top = 0
    if (orig_bottom > image_h - 1): orig_bottom = image_h - 1

    bbox_cropping = (orig_left, orig_top, orig_right, orig_bottom)

    return bbox_cropping


def video_capture(frame_queue, darknet_image_queue, darknet_width, darknet_height, input_path):

    cap = cv2.VideoCapture(input_path)
    video_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    video_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frame_resized = cv2.resize(frame_rgb, (darknet_width, darknet_height),
                                                interpolation=cv2.INTER_LINEAR)
        frame_queue.put(frame)
        img_for_detect = darknet.make_image(darknet_width, darknet_height, 3)
        darknet.copy_image_from_bytes(img_for_detect, frame_resized.tobytes())
        darknet_image_queue.put(img_for_detect)
     
   
 


def inference(cap, darknet_image_queue, detections_queue, fps_queue, network, class_names):
  
    while True:
        
        darknet_image = darknet_image_queue.get()
        detections = darknet.detect_image(network, class_names, darknet_image, thresh=args.thresh)
        detections_queue.put(detections)
            
        fps = int(1/(time.time() - prev_time))
        fps_queue.put(fps)
        print("FPS: {}".format(fps))
        darknet.print_detections(detections, args.ext_output)
        darknet.free_image(darknet_image)
            


def drawing(frame_queue, detections_queue, fps_queue, class_colors):
    random.seed(3)  # deterministic bbox colors
    #video = set_saved_video(cap, args.out_filename, (video_width, video_height))

    counts = dict()
    while True:

        y_coord = 20
        for key, values in counts.items():
            counts[key] = 0
        frame = frame_queue.get()
        
        if(detections_queue.qsize() == 0):
            continue
        detections = detections_queue.get()
        fps = fps_queue.get()
       
        detections_adjusted = []
        if frame is not None:
            for label, confidence, bbox in detections:
                bbox_adjusted = convert2original(frame, bbox)
                detections_adjusted.append((str(label), confidence, bbox_adjusted))
                counts[label] = counts.get(label,0)+1

            
            image = darknet.draw_boxes(detections_adjusted, frame, class_colors)

            if args.dont_show:
                cv2.imshow('Inference', image)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
            if args.out_filename is not None:
                pass
                #video.write(image)

            if cv2.waitKey(fps) == 27:
                break
   )
    



if __name__ == '__main__':
    frame_queue = Queue()
    darknet_image_queue = Queue(maxsize=1)
    detections_queue = Queue(maxsize=1)
    fps_queue = Queue(maxsize=1)

    args = parser()
    check_arguments_errors(args)
    network, class_names, class_colors = darknet.load_network(
            args.config_file,
            args.data_file,
            args.weights,
            batch_size=1
        )
    darknet_width = darknet.network_width(network)
    darknet_height = darknet.network_height(network)
   
    input_path = str2int(args.input)
    t1 = Process(target=video_capture, args=(frame_queue, darknet_image_queue, darknet_width, darknet_height, input_path)) 
    t2 = Process(target=inference, args=(darknet_image_queue, detections_queue, fps_queue, network, class_names))
    t3 = Process(target=drawing, args=(frame_queue, detections_queue, fps_queue, class_colors))

    p_list =[t1, t2, t3]

    join_list = []
    for p in p_list:
        p.start()
        join_list.append(j)
    for j in join_list:
        j.join()

1 Answers1

1

I had / have the same problem, well I'm "a bit" forced into this because of my hardware's computing power scarcity (I only get 3 FPS (using the GPU (less than 1 for CPU)) on my laptop).

I covered this topic (quite detailedly, I'd say) in [SO]: Pickling a ctypes.Structure with ct.Pointer (@CristiFati's answer).

I submitted an attempt (just to be available for others): [GitHub]: AlexeyAB/darknet - Multiprocessing in darknet_video.py (NOT final!), but it seems a dead end (or maybe I'm missing something).
As a side note, considering the active PR number, and commit rate (lately), I'm not sure how active / maintained that repository (same thing about its parent one) is.

For tests, I cropped a video to 1 second (considering the FPS rates that I get), and launched in 2 different Cmd terminals:

  • In the left side, it's a run that uses threading (completed)

  • In the right side, it's the multiprocessing counterpart (specified --multiprocess) currently running (upper right side contains the effects)

Output:

  • Running (to capture some FPS values):

    Img00

  • End:

    Img01

CristiFati
  • 38,250
  • 9
  • 50
  • 87
  • 1
    Thank you Sir, I actually managed to convert the code into multiprocessing but similar to your results, the performance of the detections on the video decreases dramatically(For mine, the FPS dropped by half as compared to multi-threading). Do you have a list of possible reasons why this could be possible? Anyways, thank you for your help. Your explanations and rationale has been clear and concise :) – Superman2341 Jun 18 '22 at 04:40
  • 1
    As I mentioned somewhere, I think it's the massive data pickling / unpickling which takes a lot of time, but it's just a supposition. It seems that the task (if possible) it's quite larger than I anticipated. – CristiFati Jun 20 '22 at 08:41