2

TL;DR: Is there anyway I can get rid of my second for-loop?

I have a time series of points on a 2D-grid. To get rid of fast fluctuations of their position, I average the coordinates over a window of frames. Now in my case, it's possible for the points to cover a larger distance than usual. I don't want to include frames for a specific point, if it travels farther than the cut_off value.

In the first for-loop, I go over all frames and define the moving window. I then calculate the distances between the current frame and each frame in the moving window. After I grab only those positions from all frames, where both the x and y component did not travel farther than cut_off. Now I want to calculate the mean positions for every point from all these selected frames of the moving window (note: the number of selected frames can be smaller than n_window). This leads me to the second for-loop. Here I iterate over all points and actually grab the positions from the frames, in which the current point did not travel farther than cut_off. From these selected frames I calculate the mean value of the coordinates and use it as the new value for the current frame.

This very last for-loop slows down the whole processing. I can't come up with a better way to accomplish this calculation. Any suggestions?

MWE

Put in comments for clarification.

import numpy as np

# Generate a timeseries with 1000 frames, each 
# containing 50 individual points defined by their 
# x and y coordinates
n_frames = 1000
n_points = 50
n_coordinates = 2

timeseries = np.random.randint(-100, 100, [n_frames, n_points, n_coordinates])

# Set window size to 20 frames
n_window = 20

# Distance cut off
cut_off = 60

# Set up empty array to hold results
avg_data_store = np.zeros([n_frames, timeseries.shape[1], 2])

# Iterate over all frames
for frame in np.arange(0, n_frames):
    # Set the frame according to the window size that we're looking at
    t_before = int(frame - (n_window / 2))
    t_after = int(frame + (n_window / 2))

    # If we're trying to access frames below 0, set the lowest one to 0
    if t_before < 0:
        t_before = 0

    # Trying to access frames that are not in the trajectory, set to last frame
    if t_after > n_frames - 1:
        t_after = n_frames - 1

    # Grab x and y coordinates for all points in the corresponding window
    pos_before = timeseries[t_before:frame]
    pos_after = timeseries[frame + 1:t_after + 1]
    pos_now = timeseries[frame]

    # Calculate the distance between the current frame and the windows before/after
    d_before = np.abs(pos_before - pos_now)
    d_after = np.abs(pos_after - pos_now)

    # Grab indices of frames+points, that are below the cut off
    arg_before = np.argwhere(np.all(d_before < cut_off, axis=2))
    arg_after = np.argwhere(np.all(d_after < cut_off, axis=2))

    # Iterate over all points
    for i in range(0, timeseries.shape[1]):
        # Create temp array
        temp_stack = pos_now[i]

        # Grab all frames in which the current point did _not_
        # travel farther than `cut_off`
        all_before = arg_before[arg_before[:, 1] == i][:, 0]
        all_after = arg_after[arg_after[:, 1] == i][:, 0]

        # Grab the corresponding positions for this points in these frames
        all_pos_before = pos_before[all_before, i]
        all_pos_after = pos_after[all_after, i]

        # If we have any frames for that point before / after
        # stack them into the temp array
        if all_pos_before.size > 0:
            temp_stack = np.vstack([all_pos_before, temp_stack])
        if all_pos_after.size > 0:
            temp_stack = np.vstack([temp_stack, all_pos_after])

        # Calculate the moving window average for the selection of frames
        avg_data_store[frame, i] = temp_stack.mean(axis=0)
Michael Gecht
  • 1,374
  • 1
  • 17
  • 26
  • It sounds a bit like you are trying to re-invent a Kalman filter. – Paul Brodersen Aug 25 '17 at 10:37
  • 1
    Something like: https://stackoverflow.com/questions/13901997/kalman-2d-filter-in-python – Paul Brodersen Aug 25 '17 at 12:21
  • 1
    Just to clarify: is the problem that (a) there are measurement artifacts / outliers that you want to get rid off, or (b) that there are genuine jumps in the movement of your particles that you don't want to average out through a moving window average? – Paul Brodersen Aug 25 '17 at 15:50
  • @Paul It's the second problem. Basically I'm simulating particles in a box and through periodic boundary conditions they are able to jump across the box barriers. If a particle continuously jumps across the barrier in a given window of frames, its average position is in the middle of the box. – Michael Gecht Sep 04 '17 at 08:43

1 Answers1

0

If you are fine with calculating the cutoff distance in x and y separately, you can use scipy.ndimage.generic_filter.

enter image description here

import numpy as np
from scipy.ndimage import generic_filter

def _mean(x, cutoff):
    is_too_different = np.abs(x - x[len(x) / 2]) > cutoff
    return np.mean(x[~is_too_different])

def _smooth(x, window_length=5, cutoff=1.):
    return generic_filter(x, _mean, size=window_length, mode='nearest', extra_keywords=dict(cutoff=cutoff))

def smooth(arr, window_length=5, cutoff=1., axis=-1):
    return np.apply_along_axis(_smooth, axis, arr, window_length=window_length, cutoff=cutoff)

# --------------------------------------------------------------------------------

def _simulate_movement_2d(T, fraction_is_jump=0.01):

    # generate random velocities with a few "jumps"
    velocity = np.random.randn(T, 2)
    is_jump = np.random.rand(T) < fraction_is_jump
    jump = 10 * np.random.randn(T, 2)
    jump[~is_jump] = 0.

    # pre-allocate position and momentum arrays
    position = np.zeros((T,2))
    momentum = np.zeros((T,2))

    # initialise the first position
    position[0] = np.random.randn(2)

    # update position using velocity vector:
    # smooth movement by not applying the velocity directly
    # but rather by keeping track of the momentum
    for ii in range(2,T):
        momentum[ii] = 0.9 * momentum[ii-1] + 0.1 * velocity[ii-1]
        position[ii] = position[ii-1] + momentum[ii] + jump[ii]

    # add some measurement noise
    noise = np.random.randn(T,2)
    position += noise
    return position

def demo(nframes=1000, npoints=3):
    # create data
    positions = np.array([_simulate_movement_2d(nframes) for ii in range(npoints)])

    # format to (nframes, npoints, 2)
    position = positions.transpose([1, 0, 2])

    # smooth
    smoothed = smooth(positions, window_length=11, cutoff=5., axis=1)

    # plot
    x, y = positions.T
    xs, ys = smoothed.T

    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(1,1)
    ax.plot(x, y, 'o')
    ax.plot(xs, ys, 'k-', alpha=0.3, lw=2)
    plt.show()

demo()
Paul Brodersen
  • 11,221
  • 21
  • 38