I am processing a video using opencv in python, wherein I have to do some processing on each frame of video, and keep storing the results of that processing that I did on that frame, so for that I first tried the normal sequential approach i.e.
- Read one frame at a time.
- Do processing on that frame, append the result to a list.
- Repeat.
But then I found that while I am processing a frame, I can read the next frame of the video in advance, so after giving it a thought, I arrived at this approach:
- Read a frame from video.
- Append the frame to a list i.e.
frames.append(frame)
, If the list size is 100, then append a job in the job queue using this:jobs.append(ppool.apply_async(process_fun, (frames,)))
and then empty the list for next batch of frames i.e.frames=[]
. - Repeat
- When done reading frames, wait for jobs to complete and then fetch their results
(Here process_fun
is the function that does processing on a batch of frames)
Here is the complete code for that:
import cv2
import os
import random
import multiprocessing as mp
def process(frames):
lower = 0
upper = 30
frames_len = len(frames)
idx = 0
black_counts = []
while idx < frames_len:
# cv2.imwrite('frame ' + str(idx) + ".jpg", frames[idx])
frames[idx] = cv2.cvtColor(frames[idx], cv2.COLOR_BGR2GRAY)
black_selected = cv2.inRange(frames[idx], lower, upper)
black_count = cv2.countNonZero(black_selected)
black_counts.append(black_count)
idx += 1
return black_counts
if __name__ == "__main__":
ppool = mp.Pool(processes=os.cpu_count())
cap = cv2.VideoCapture('../movie.mp4')
total_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT)
print("total_frames:", total_frames)
# random_frame_start = random.randint(1, total_frames)
# cap.set(cv2.CAP_PROP_POS_FRAMES, random_frame_start)
# print("random_frame_start: ", random_frame_start)
# frames_to_read = 1500
frames_to_read = total_frames
frames = []
jobs = []
while frames_to_read >= 0:
grabbed, frame = cap.read()
if grabbed:
frames.append(frame)
frames_to_read -= 1
if frames_to_read % 100 == 0:
print("appending a job at frames_to_read:", frames_to_read)
jobs.append(ppool.apply_async(
process, (frames,)))
frames = []
cap.release()
ppool.close()
ppool.join()
results = []
for job in jobs:
results.append(job.get())
print("results are:", results)
The posted code reads all frames of some movie.mp4.
But when I run this code for values of frames_to_read
between 100 to 5000(using the commented code), it runs fine(although it still takes large amount of memory), but when I run for values of frames_to_read
equal to total_frames
it consumes my whole memory(8GB RAM and ALL OF SWAP) and the system hangs.
So, my question is why is this code consuming so much memory that the system gets hanged and I have to forcefully(physically) switch it off ? The memory consumption on a single run of this code keeps increasing and never decreases.. why is this happening and how to solve it ? :)
I found some other SO questions where people were having this same problem, like this and this, but neither of those solutions worked.