0

I have a DataFrame which contains X & Y data for many trajectories (not GPS data).

I am trying to figure out how to resample/time-normalize them so the distance between points is evenly spaced.

As they are right now, there are regions of the trajectories with higher density of points.

In the below scatterplots, I show one of the overall trajectories, and then a zoomed in portion of the trajectory to show how the density of points changes (i.e, the spacing between points is irregular).

overall trajectory

zoomed in portion with irregular spacing between points

My dataframes look like this:

     (0, 1, 1)_mean_X  (0, 1, 1)_mean_Z  ...  (2, 2, 3)_mean_X  (2, 2, 3)_mean_Z
0          -15.856713          5.002617  ...        -15.874083         -5.000582
1          -15.831320          5.003529  ...        -15.848551         -5.000925
2          -15.805927          5.004441  ...        -15.823020         -5.001268
3          -15.780534          5.005353  ...        -15.797489         -5.001611
4          -15.755141          5.006265  ...        -15.771958         -5.001955
..                ...               ...  ...               ...               ...
995         15.547392         11.280298  ...         15.257689        -12.455845
996         15.548967         11.278968  ...         15.258225        -12.457202
997         15.550542         11.277638  ...         15.258761        -12.458560
998         15.552116         11.276309  ...         15.259296        -12.459917
999         15.553691         11.274979  ...         15.259832        -12.461275
CentauriAurelius
  • 504
  • 3
  • 21
  • Hi, can you provide a sample of your data? – PieCot Feb 15 '21 at 07:29
  • you can resample with ... erm [`resample`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html) – Stef Feb 15 '21 at 08:09
  • @Stef not if you don't have a datetime-like index – anon01 Feb 15 '21 at 08:15
  • @anon01 correct, but OP wrote he wants to *time-normalize* so I figured the x-values would be some kind of time information (timestamps for instance), but yes - your answer if excellent for the given case (+1) – Stef Feb 15 '21 at 08:17
  • The sample doesn't look right. Why you said that you have X and Y coordinates, but there are some columns named Z in the data, and why are there more than two columns? – user202729 Feb 16 '21 at 10:29
  • Z is equal to Y in this dataset. The dataset contains many different trajectories. Only one is plotted as an example. – CentauriAurelius Feb 16 '21 at 10:31

1 Answers1

3

Pandas has an interp function, but for processing like this I would prefer numpy/scipy. The vectorized functions are often faster than pandas. Example:

from scipy.interpolate import interp1d

x = np.logspace(0,2,300)
y = x**2
df = pd.DataFrame(np.array([x, y]).T, columns=list("xy"))

# define interpolation function:
f = interp1d(x, y)

# create new df with desired x vals, generate y with interp function:
x_new = np.linspace(x.min(),x.max(),1000)
y_new = f(x_new)
df_new = pd.DataFrame(np.array([x_new, y_new]).T, columns=["x_new", "y_new"])

Note this will fail if x_new is outside the original domain - this makes sense as it's just linear interpolation.

anon01
  • 10,618
  • 8
  • 35
  • 58
  • Thanks but doesnt seem to be working on my end. The new x and y values are originally ranging from -20 to + 20 but after doing this the values go into the millions. Also, the x & y data are both positional/spatial, so the index is the only measure of 'time'. – CentauriAurelius Feb 15 '21 at 08:22
  • I see that when I run your code on my data – CentauriAurelius Feb 15 '21 at 08:26
  • thanks, im trying to figure out how to get it working with my data and will accept & upvote as soon as I do. It seems like I need to run this code on the X and Y positions separately, but when I tried to do that it didnt return evenly spaced points. – CentauriAurelius Feb 15 '21 at 08:47
  • From a discussion with the op https://stackoverflow.com/questions/66221994/interpolation-to-evenly-space-trajectory-data-for-different-curves#comment117079104_66221994 it appears that you misunderstood the question... The `x` column is not the time data, it's a coordinate value. – user202729 Feb 16 '21 at 10:59