I am using Pandas to interpolate datapoints in time, however when resampling and interpolating, I get different results for the same interpolated time when using different resampling rates.
Here is a test example:
import pandas as pd
import datetime
data = pd.DataFrame({'time': list(map(lambda a: datetime.datetime.strptime(a, '%Y-%m-%d %H:%M:%S'),
['2021-03-28 12:00:00', '2021-03-28 12:01:40',
'2021-03-28 12:03:20', '2021-03-28 12:05:00',
'2021-03-28 12:06:40', '2021-03-28 12:08:20',
'2021-03-28 12:10:00', '2021-03-28 12:11:40',
'2021-03-28 12:13:20', '2021-03-28 12:15:00'])),
'latitude': [44.0, 44.00463175663968, 44.00919766508212,
44.01357245844425, 44.0176360866699, 44.02127701531401,
44.02439529286458, 44.02690530159084, 44.02873811544965,
44.02984339933479],
'longitude': [-62.75, -62.74998054893869, -62.748902164559304,
-62.74679419470262, -62.7437142666763, -62.739746727555016,
-62.735000345048086, -62.72960533041183, -62.72370976436673,
-62.717475524320704]})
data.set_index('time', inplace=True)
a = data.resample('20s').interpolate(method='time')
b = data.resample('60s').interpolate(method='time')
print(a.iloc[:18:3])
print(b.iloc[:6])
# --- OUTPUT --- #
latitude longitude
time
2021-03-28 12:00:00 44.000000 -62.750000
2021-03-28 12:01:00 44.002779 -62.749988 # <-- Different Values
2021-03-28 12:02:00 44.005545 -62.749765 # <-- Different Values
2021-03-28 12:03:00 44.008284 -62.749118 # <-- Different Values
2021-03-28 12:04:00 44.010948 -62.748059 # <-- Different Values
2021-03-28 12:05:00 44.013572 -62.746794
latitude longitude
time
2021-03-28 12:00:00 44.000000 -62.750000
2021-03-28 12:01:00 44.002714 -62.749359 # <-- Different Values
2021-03-28 12:02:00 44.005429 -62.748718 # <-- Different Values
2021-03-28 12:03:00 44.008143 -62.748077 # <-- Different Values
2021-03-28 12:04:00 44.010858 -62.747435 # <-- Different Values
2021-03-28 12:05:00 44.013572 -62.746794
The a
dataframe and b
dataframe should predict the same value on the minute, however in most cases they differ at this time.
Does anyone know what could be causing this? When plotting the full results, it looks like resampling on the minute causes pandas to ignore data in timestamps that are not on the minute (12:01:40 and 12:03:20 for example).