Video frames as inputs to the Tensorflow graph

Question

More specifically, how to create a custom reader that reads frames from a video and feeds them into the tensorflow model graph.

Second, how can I use opencv to decode the frames to create the custom reader if possible?

Is there any code which can better demonstrate the purpose in mind (In python)?

I am mainly working on emotion recognition through facial expression and I have videos as input in my database.

Finally, I have tried using a Queue and a QueueRunner with a Coordinator hoping to solve the problem in hand. According to the documentation in https://www.tensorflow.org/programmers_guide/threading_and_queues, the QueueRunner runs the enqueue operation which in turn, takes an operation to create one example (Can we use opencv in this operation, to create one example, to return the frames as the examples to enqueue?)

Please note that my purpose is to let the enqueue and dequeue operation to occur a the same time on different threads.

Following is my code so far:

def deform_images(images):
    with tf.name_scope('current_image'):
        frames_resized = tf.image.resize_images(images, [90, 160])
        frame_gray = tf.image.rgb_to_grayscale(frames_resized, name='rgb_to_gray')
        frame_normalized = tf.divide(frame_gray, tf.constant(255.0), name='image_normalization')

        tf.summary.image('image_summmary', frame_gray, 1)
        return frame_normalized

def queue_input(video_path, coord):
    global frame_index
    with tf.device("/cpu:0"):
        # keep looping infinitely

        # source: http://stackoverflow.com/questions/33650974/opencv-python-read-specific-frame-using-videocapture
        cap = cv2.VideoCapture(video_path)
        cap.set(1, frame_index)

        # read the next frame from the file, Note that frame is returned as a Mat.
        # So we need to convert that into a tensor.
        (grabbed, frame) = cap.read()

        # if the `grabbed` boolean is `False`, then we have
        # reached the end of the video file
        if not grabbed:
            coord.request_stop()
            return

        img = np.asarray(frame)
        frame_index += 1
        to_retun = deform_images(img)
        print(to_retun.get_shape())
        return to_retun

frame_num = 1

with tf.Session() as sess:
    merged = tf.summary.merge_all()
    train_writer = tf.summary.FileWriter('C:\\Users\\temp_user\\Documents\\tensorboard_logs', sess.graph)
    tf.global_variables_initializer()

    coord = tf.train.Coordinator()
    queue = tf.FIFOQueue(capacity=128, dtypes=tf.float32, shapes=[90, 160, 1])
    enqueue_op = queue.enqueue(queue_input("RECOLA-Video-recordings\\P16.mp4", coord))

    # Create a queue runner that will run 1 threads in parallel to enqueue
    # examples. In general, the queue runner class is used to create a number of threads cooperating to enqueue
    # tensors in the same queue.
    qr = tf.train.QueueRunner(queue, [enqueue_op] * 1)

    # Create a coordinator, launch the queue runner threads.
    # Note that the coordinator class helps multiple threads stop together and report exceptions to programs that wait
    # for them to stop.
    enqueue_threads = qr.create_threads(sess, coord=coord, start=True)

    # Run the training loop, controlling termination with the coordinator.
    for step in range(8000):
        print(step)
        if coord.should_stop():
            break

        frames_tensor = queue.dequeue(name='dequeue')
        step += 1

    coord.join(enqueue_threads)

train_writer.close()
cv2.destroyAllWindows()

Thank you!!

How's the performance of this code. I need to do some similiar work. If I can get some feedback from someone who already did it, it would be better. — scott huang, Oct 19 '17 at 06:15

score 3 · Answer 1 · answered Mar 23 '17 at 22:03

tf.QueueRunner is not the most suitable mechanism for your purposes. In the code you have, the following line

enqueue_op = queue.enqueue(queue_input("RECOLA-Video-recordings\\P16.mp4", coord))

creates the enqueue_op that will enqueue a constant tensor, namely the first frame returned from the queue_input function every time it is run. Even though QueueRunner is calling it repeatedly, it is always enqueueing the same tensor, namely the one that was provided to it during the operation creation. Instead, you can simply make the enqueue operation take a tf.placeholder as its argument, and run it repeatedly in a loop, feeding it the frame you grabbed via OpenCV. Here is some code to guide you.

frame_ph = tf.placeholder(tf.float32)
enqueue_op = queue.enqueue(frame_ph)

def enqueue():
  while not coord.should_stop():
    frame = queue_input(video_path, coord)
    sess.run(enqueue_op, feed_dict={frame_ph: frame})

threads = [threading.Thread(target=enqueue)]

for t in threads:
  t.start()

# Your dequeue and training code goes here
coord.join(threads)

score 3 · Answer 2 · edited Jun 20 '20 at 09:12

3

pip install video2tfrecord

Explanation

During a research project I was confronted with generating tfrecords from raw video material in Python. Coming across many similar requests very similar to this thread, I made a part of my code available under

https://github.com/ferreirafabio/video2tfrecords

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 12 '17 at 17:43

whiletrue

10,500
6
27
47

Video frames as inputs to the Tensorflow graph

2 Answers2

Explanation