1

I have a dataset with the GPS coordinates and the timestamp of an individual's trajectory. It looks like this: enter image description here

The data is recorded with a smartphone app, and it is designed to collect data every second, as you can see in the image. However, as you also note, in some cases (due to gps errors or satellite connection) the data is collected over a longer period of time (in the figure the first time difference is 2 seconds for example, but in some cases is even longer, being 4-5 seconds or more).

For my study, I need the GPS coordinates (latitude and longitude) every 1 second, so I thought I'd do an interpolation, creating new rows of data with the coordinates and time missing in each case.

I have been investigating for a long time but I can't find any way to do it. My idea is to create first the rows with "NaN" values, reindexing them, and then filling the columns of latitude, longitude and time using the interpolation function: df.interpolate(). But I can' t figure out how to do it.

If anyone has any ideas on how to do this it would be a great help to me.

Thank you very much.

Ferran
  • 21
  • 1
  • 3
  • 1
    Please paste the data as text, not as an image. – Chris Jan 13 '21 at 16:50
  • parse time column to datetime (if not already done) and [resample](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html) to 1s on that column? You might also want to set time as index. – FObersteiner Jan 14 '21 at 07:01

1 Answers1

1

Basically after resampling the time in order to have a row for every second, reindexing is necessary, as your current index won't be representative anymore. Calling .bfill(limit=1) on your dataframe, will populate all the interpolated values with NaN

import pandas as pd

df.index = pd.to_datetime(df['time'])
df = df.resample(f"1S").bfill(limit=1)
df = df.interpolate()

The problem with this solution is that pandas' interpolate functionality performs a linear interpolation whilst the GPS system is a spherical coordinate system. A nice explanation can be found on this answer.

A viable solution would be to:

  • Find out in which coordinate reference system (CRS) are your coordinates recorded (most probably 4326)
  • Convert them in the a planar coordinate system like 27700 for instance
  • Perform a linear interpolation on the converted coordinates
  • Reconvert them to the spherical system and overwrite your lat/lon dataframe columns

More details on that you can find in this answer. As a short implementation:

from pyproj import Transformer, CRS

transformer = Transformer.from_crs(4326, 27700)
back_transformer = Transformer.from_crs(27700, 4326)
x, y = transformer.transform(df.latitude.values, df.longitude.values)
df['x'] = x
df['y'] = y

df = df.interploate()

lat, lon = back_transformer.transform(df.x.values, df.y.values)
df.latitude = lat
df.longitude = lon

Hope it helps you solving your problem!

RobertMoga
  • 76
  • 2