9

I've been wondering what the optical flow matrix, that calcOpticalFlowFarneback function of OpenCV returns, tells. If I compute this Python line:

flow = cv2.calcOpticalFlowFarneback(cv2.UMat(prvs),cv2.UMat(next), None, 0.5, 3, 15, 3, 5, 1.2, 0)

I will get a matrix, the same size as prvs and next frames, containing, for each position, a vector of two elements (x,y). My question is... That vector is the vector from prvs to next or from next to prvs?

Thanks.

  • 1
    from `prvs` to `next` of course! You can verify this as well by just taking a look at the values. The pixel at `prvs[y, x]` will move `flow[y, x]` pixels to get to `next[y', x']`. In other words, `next[flow[y, x]] = prvs[y, x]`. (This is just meant as an example, you'd need to take special care of the indexing ordering here). – alkasm Dec 09 '17 at 22:12
  • Then, I don't understand why it gives me bad results. I'm trying to use that flow to perform Motion Interpolation and interpolate intermediate frames. If I wanted to move a point from the `prvs` frame to the exact intermediate point between `prvs` and `next`, what should I consider? `prvs[x,y] + flow[x,y]` or `prvs[x,y] - flow[x,y]`? @AlexanderReynolds –  Dec 09 '17 at 22:27
  • And why do you use `[y,x]` instead of `[x,y]`? Are the flow vector elements changed? First the y component and second the x component? @AlexanderReynolds –  Dec 09 '17 at 22:32
  • `[y, x]` because they're images, which are arrays and as such indexed with `(row, col)` i.e. `(y, x)`. `prvs[y, x]` is a *pixel*, `flow[y, x]` is a *vector*, you should not be adding them. Technically it should be `next[ [y, x] + flow[y, x][::-1] ] = prvs[y, x]`, the `[::-1]` there because I think `flow[y, x]` will give you coordinates in (x, y) order so `[::-1]` reverses them to `[y, x]` for indexing. I can try it out and give you a better response later when I'm home. – alkasm Dec 10 '17 at 00:43

1 Answers1

14

The general purpose of an optical flow method is to find the velocity component of each pixel (if dense) or of each feature point (if sparse) between two images (or video frames, typically). The idea is that pixels in frame N-1 move to new positions in frame N, and the difference in the location of these pixels is like a velocity vector. That means that a pixel at location (x, y) in the previous frame will be at location (x+v_x, y+v_y) in the next frame.

For the values of pixels, that means that for a given position (x, y), the value of the pixel at prev_frame(x, y) is the same as the value of the pixel at curr_frame(x+v_x, y+v_y). Or more specifically, in terms of actual array indices:

prev_frame[y, x] == curr_frame[y + flow[y, x, 1], x + flow[y, x, 0]]

Notice the reverse ordering of (x, y) here. Arrays are indexed with (row, col) ordering, which means the y component comes first, and then the x component. Do take special care to note that flow[y, x] is a vector where the first element is the x coordinate, and the second is the y coordinate---hence why I added y + flow[y, x, 1] and x + flow[y, x, 0]. You'll see the same thing written in the docs for calcOpticalFlowFarneback():

The function finds an optical flow for each prev pixel using the Farneback algorithm so that

prev(y,x) ~ next(y + flow(y,x)[1], x + flow(y,x)[0])

Dense optical flow algorithms expect the pixels to be not very far from where they started, hence they're typically used on video---where there's not a huge amount of change every frame. If there's a massive difference every frame, you're likely not going to get the proper estimation. Of course, the purpose of the pyramid resolution model is to help with larger jumps, but you'll need to take care with choosing the proper scales of resolution.

Here's a full fledged example. I'll start with this short timelapse that I shot in Vancouver earlier this year. I'll create a function which ascribes the direction of the flow for each pixel with a color, and the magnitude of the flow with the brightness of that color. That means brighter pixels will correspond to higher flows, and the color corresponds to the direction. This is what they do in the last example on the OpenCV optical flow tutorial as well.

import cv2
import numpy as np

def flow_to_color(flow, hsv):
    mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1])
    hsv[..., 0] = ang*180/np.pi/2
    hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)
    return cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

cap = cv2.VideoCapture('vancouver.mp4')

fps = cap.get(cv2.CAP_PROP_FPS)
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('optflow.mp4', fourcc, fps, (w, h))

optflow_params = [0.5, 3, 15, 3, 5, 1.2, 0]

frame_exists, prev_frame = cap.read()
prev = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
hsv = np.zeros_like(prev_frame)
hsv[..., 1] = 255

while(cap.isOpened()):
    frame_exists, curr_frame = cap.read()
    if frame_exists:
        curr = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
        flow = cv2.calcOpticalFlowFarneback(prev, curr, None, *optflow_params)
        rgb = flow_to_color(flow, hsv)
        out.write(rgb)
        prev = curr
    else:
        break

cap.release()
out.release()
print('done')

And here's the resulting video.

However, what you want to do is interpolate between frames. This gets a little confusing, because the best way to do that is with cv2.remap() but this function works in the opposite direction that we want. The optical flow tells us where the pixel goes, but remap() wants to know where the pixel came from. So actually, we need to swap the ordering of the optical flow calculation to remap. See my answer here for a thorough explanation of the remap() function.

So here I've created a function interpolate_frames() which will interpolate over however many frames you want from the flow. This works exactly as we discussed in the comments but note the flipped ordering of curr and prev inside calcOpticalFlowFarneback().

The timelapse video above is a bad candidate since the interframe movement is very high. Instead, I'll use a short clip from another video shot in the same location as the input.

import cv2
import numpy as np


def interpolate_frames(frame, coords, flow, n_frames):
    frames = [frame]
    for f in range(1, n_frames):
        pixel_map = coords + (f/n_frames) * flow
        inter_frame = cv2.remap(frame, pixel_map, None, cv2.INTER_LINEAR)
        frames.append(inter_frame)
    return frames


cap = cv2.VideoCapture('vancouver.mp4')

fps = cap.get(cv2.CAP_PROP_FPS)
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('optflow-inter1a.mp4', fourcc, fps, (w, h))

optflow_params = [0.5, 3, 15, 3, 5, 1.2, 0]

frame_exists, prev_frame = cap.read()
prev = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
y_coords, x_coords = np.mgrid[0:h, 0:w]
coords = np.float32(np.dstack([x_coords, y_coords]))

while(cap.isOpened()):
    frame_exists, curr_frame = cap.read()
    if frame_exists:
        curr = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
        flow = cv2.calcOpticalFlowFarneback(curr, prev, None, *optflow_params)
        inter_frames = interpolate_frames(prev_frame, coords, flow, 4)
        for frame in inter_frames:
            out.write(frame)
        prev_frame = curr_frame
        prev = curr
    else:
        break

cap.release()
out.release()

And here's the output. There are 4 frames for every frame in the original, so it's slowed down 4x. Of course, there will be black edge pixels coming in so when doing this you'll probably either want to do some sort of border interpolation of your frames (you can use cv2.copyMakeBorder()) to repeat similar edge pixels, and/or crop the final output a bit to get rid of that. Note that most video stabilization algorithms do crop the image for similar reasons. That's part of the reason why, when you switch your phone camera to video, you'll notice a larger focal length (it looks zoomed in a bit).

alkasm
  • 22,094
  • 5
  • 78
  • 94
  • Then, having the flow vector, the previous frame and the next frame, to find the value of a pixel in the interpolated frame I should use this. For example, if I wanted to find the x coordinate in the previous frame: `x_prev = x - 0.5 * flow[x,y][0]` And the x coordinate on the next frame: `x_next = x + 0.5 * flow[x,y][0]` Am I right? –  Dec 10 '17 at 07:39
  • Yep! That's how you could generate interframe interpolation. Again be cautious of the ordering here, should be `flow[y, x]`. However! And this is an important point: the flow vector will give inter-pixel measurements (i.e. `flow[y, x][0]` might be `3.105`). You'll of course need to round or truncate to integer indices. But what happens if two values get mapped to the same point after rounding? Similarly, what happens if there are some pixels which don't get mapped to? That will happen in a lot of places. You should use `cv2.remap` to do the interpolation for you and it will take care of that. – alkasm Dec 10 '17 at 07:58
  • @kelirkenan also, check out my answer [here](https://stackoverflow.com/a/46524544/5087436) showing how to use `cv2.remap()`. The docs can be a little confusing, and this should get you up to speed in no time. Additionally [this](https://stackoverflow.com/a/44538714/5087436) answer shows how to immediately apply `cv2.remap()` from the optical flow result. – alkasm Dec 10 '17 at 08:01
  • Perfect! Thanks. –  Dec 10 '17 at 08:36
  • @kelirkenan I actually went ahead and updated showing how to interpolate with `remap()`. I hadn't done this myself, so I was just curious how it worked. Check it out, the results are pretty cool! – alkasm Dec 10 '17 at 09:55
  • This is a great answer but I'm curious if it can be used with sporadic random data from a numpy array. I have values in a large 2D array that vary between 0-60. I've divided the array by 255 for a normalization and then passed into the `cv2.calcOpticalFlowFarneback(curr, prev, None, *optflow_params)` function. The max flow value seems really small `1.492787e-11` and I am seeing no values in the remapped array. I've also tried tweaking the optflow_params. Any ideas? – wuffwuff Mar 01 '20 at 17:38