How can I convert a video to a multiple of numpy arrays or a single one to use it for machine learning . I only found ways to do it on images.
-
A video is a collection of images. – programandoconro May 21 '21 at 23:34
-
2Does this answer your question? [How to turn a video into numpy array?](https://stackoverflow.com/questions/42163058/how-to-turn-a-video-into-numpy-array) – Nathan Mills May 21 '21 at 23:35
1 Answers
A regular image is represented as a 3D Tensor with the following shape: (height, width, channels)
. The channels value 3 if the image is RGB and 1 if it is grayscale.
A video is a collection of N frames, where each frame is an image. You'd want to represent this data as a 4D tensor: (frames, height, width, channels)
.
So for example if you have 1 minute of video with 30 fps, where each frame is RGB and has a resolution of 256x256, then your tensor would look like this: (1800, 256, 256, 3)
, where 1800 is the number of frames in the video: 30 (fps) * 60 (seconds).
To achieve this you can basically open each individual frame of the video, store them all in a list and concatenate them together along a new axis (i.e. the "frames" dimension).
You can do this through OpenCV:
# Import the video and cut it into frames.
vid = cv2.VideoCapture('path/to/video/file')
frames = []
check = True
i = 0
while check:
check, arr = vid.read()
if not i % 20: # This line is if you want to subsample your video
# (i.e. keep one frame every 20)
frames.append(arr)
i += 1
frames = np.array(frames) # convert list of frames to numpy array

- 6,874
- 5
- 36
- 41