3

I have a CSV file as below:

                  t  dd  hh  v.amm  v.alc  v.no2  v.cmo   aqi
0      201811170000  17   0   0.40   0.41   1.33   1.55  2.45
1      201811170002  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170007  17   0   0.40   0.37   1.35   1.45  2.40

Now I have to fill in the missing minutes by last observation carried forward. Expected output:

                  t  dd  hh  v.amm  v.alc  v.no2  v.cmo   aqi
0      201811170000  17   0   0.40   0.41   1.33   1.55  2.45
1      201811170001  17   0   0.40   0.41   1.33   1.55  2.45
2      201811170002  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170003  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170004  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170005  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170006  17   0   0.40   0.41   1.34   1.51  2.46
3      201811170007  17   0   0.40   0.37   1.35   1.45  2.40

I tried following this link but unable to achieve the expected output. Sorry I'm new to coding.

Joey
  • 73
  • 5

1 Answers1

3

First create DatetimeIndex by to_datetime and DataFrame.set_index and then change frequency by DataFrame.asfreq:

df['t'] = pd.to_datetime(df['t'], format='%Y%m%d%H%M')
df = df.set_index('t').sort_index().asfreq('Min', method='ffill')
print (df)
                     dd  hh  v.amm  v.alc  v.no2  v.cmo   aqi
t                                                            
2018-11-17 00:00:00  17   0    0.4   0.41   1.33   1.55  2.45
2018-11-17 00:01:00  17   0    0.4   0.41   1.33   1.55  2.45
2018-11-17 00:02:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:03:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:04:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:05:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:06:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:07:00  17   0    0.4   0.37   1.35   1.45  2.40

Or use DataFrame.resample with Resampler.ffill:

df['t'] = pd.to_datetime(df['t'], format='%Y%m%d%H%M')
df = df.set_index('t').sort_index().resample('Min').ffill()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • KeyError: 't' `df = pd.read_csv ('data.csv', index_col="t")` Am I doing it wrong? – Joey Mar 05 '20 at 11:58
  • @Joey - Only use `df = pd.read_csv ('data.csv')` and then apply solution, `t` is column name, if different check `print (df.columns.tolist()` – jezrael Mar 05 '20 at 12:01
  • I tried again and got ValueError: index must be monotonic increasing or decreasing. Sorry for the trouble. The solution works when using DataFrame.resample! – Joey Mar 05 '20 at 12:07
  • @Joey - No problem, try sorting before `asfreq` or `resample`, answer was edited. Also check if datetimes are converted correctly. – jezrael Mar 05 '20 at 12:09