Pandas: Delete Rows or Interpolate

Question

I'm trying to learn IoT data using time series. The data comes from two different sources. In some measurements, the difference between the sources is very small: one source has 11 rows and the second source has 15 rows. In other measurements, one source has 30 rows and the second source has 240 rows.

Thought to interpolate using:

 df.resample('20ms').interpolate()

but sow that it delete some rows. Is there any method to interpolate without deleting or should I delete rows?

EDIT - data and code:

#!/usr/bin/env python3.6
import pandas as pd
import sklearn.preprocessing
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
first_df_file_name='interpolate_test.in'
df = read_csv(first_df_file_name, header=0, squeeze=True, delimiter=' ')
print(df.head(5))
idx=0
new_col = pd.date_range('1/1/2011 00:00:00.000000', periods=len(df.index), freq='100ms')
df.insert(loc=idx, column='date', value=new_col)
df.set_index('date', inplace=True)
upsampled = df.resample('20ms').interpolate()
print('20 ms, num rows', len(upsampled.index))
print(upsampled.head(5))
upsampled.to_csv('test_20ms.out')
upsampled = df.resample('60ms').interpolate()
print('60 ms, num rows', len(upsampled.index))
print(upsampled.head(5))
upsampled.to_csv('test_60ms.out')

This is the test input file name:

Here is the output (parts of it)

 //output of interpolating by 20 milis - this is fine
                         a      b
 date                                 
 2011-01-01 00:00:00.000  100.0  200.0
 2011-01-01 00:00:00.020  120.0  240.0
 2011-01-01 00:00:00.040  140.0  280.0
 2011-01-01 00:00:00.060  160.0  320.0
 2011-01-01 00:00:00.080  180.0  360.0
 60 ms, num rows 16

 //output when interpolating by 60 milis - data is lost
                         a      b
 date                                 
 2011-01-01 00:00:00.000  100.0  200.0
 2011-01-01 00:00:00.060  160.0  320.0
 2011-01-01 00:00:00.120  220.0  440.0
 2011-01-01 00:00:00.180  280.0  560.0
 2011-01-01 00:00:00.240  340.0  680.0

So, should I delete rows from the largest source instead of interpolating? If I'm interpolating, how can I avoid loosing data?

Hi, please see [How to ask](https://stackoverflow.com/help/how-to-ask) and [How to create a MCVE](https://stackoverflow.com/help/mcve). For `pandas`, see [How to ask a good pandas question](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — Evan, Jun 01 '19 at 03:49
Like this/ https://stackoverflow.com/questions/35918248/keep-original-data-points-when-padding-a-signal-with-pandas — Evan, Jun 02 '19 at 21:53
@Evan not sure that it works as needed. according to https://datascience.stackexchange.com/questions/25924/difference-between-interpolate-and-fillna-in-pandas fillna cannot recieve as a parameter a function. So if there 3 missing values between 100 and 200 I can't fill it to have 100, 125, 250, 275, 200 — nmnir, Jun 05 '19 at 07:09

Pandas: Delete Rows or Interpolate

0 Answers0