I would like to split a video, such that for every frame I have an object which contains both the frame's image and audio samples (for example as an array of bytes).
I've found many directions on how to extract just the images, but not how to include the audio.
How would I split a video up like that, for example using Python?