4

How can I convert a video to a multiple of numpy arrays or a single one to use it for machine learning . I only found ways to do it on images.

Saso
  • 74
  • 12

1 Answers1

1

A regular image is represented as a 3D Tensor with the following shape: (height, width, channels). The channels value 3 if the image is RGB and 1 if it is grayscale.

A video is a collection of N frames, where each frame is an image. You'd want to represent this data as a 4D tensor: (frames, height, width, channels).

So for example if you have 1 minute of video with 30 fps, where each frame is RGB and has a resolution of 256x256, then your tensor would look like this: (1800, 256, 256, 3), where 1800 is the number of frames in the video: 30 (fps) * 60 (seconds).

To achieve this you can basically open each individual frame of the video, store them all in a list and concatenate them together along a new axis (i.e. the "frames" dimension).


You can do this through OpenCV:

# Import the video and cut it into frames.
vid = cv2.VideoCapture('path/to/video/file')

frames = []
check = True
i = 0

while check:
    check, arr = vid.read()
    if not i % 20:  # This line is if you want to subsample your video
                    # (i.e. keep one frame every 20)
        frames.append(arr)
    i += 1

frames = np.array(frames)  # convert list of frames to numpy array
Djib2011
  • 6,874
  • 5
  • 36
  • 41