0

I'm pretty new to coding and have a problem resampling my dataframe with Pandas. I need to resample my data ("value") to means for every 10 minutes (13:30, 13:40, etc.). The problem is: The data start around 13:36 and I can't access them by hand because I need to do this for 143 dataframes. Resampling adds the mean at the respective index (e.g. 13:40 for the second value), but because 13:30 is not part of my indices, that value gets lost.

I'm trying two different approaches here: First, I tried every option of resample() (offset, origin, convention, ...). Then I tried adding the missing values manually with a loop, which doesn't run properly because I didn't know how to access the correct spot on the list. The list does include all relevant values though. I also tried adding a row with 13:30 as the index on top of the dataframe but didn't manage to convince Python that my index is legit because it's a timestamp (this is not in the code).

Sorry for the very rough code, it just didn't work in several places which is why I'm asking here.

If you have a possible solution, please keep in mind that it has to function within an already long loop because of the many dataframes I have to work on simultaneously.

Thank you very much!

df["tenminavg"] = df["value"].resample("10Min").mean()
df["tenminavg"] = df["tenminavg"].ffill()

ls1 = df["value"].resample("10Min").mean() #my alternative: list the resampled values in order to eventually access the first relevant timespan

for i in df.index: #this loop doesn't work. It should add the value for the first 10 min
   if df["tenminavg"][i]=="nan":
       if datetime.time(13,30) <= df.index.time < datetime.time(13,40):
           df["tenminavg"][i] = ls1.index.loc[i.floor("10Min")]["value"] #tried to access the corresponding data point in the list
   else:
       continue
Leni
  • 1
  • 2
  • Can you clarify what you mean by the statement "The data start around 13:36 and I can't access them by hand because I need to do this for 143 dataframes. Resampling adds the mean at the respective index (e.g. 13:40 for the second value), but because 13:30 is not part of my indices, that value gets lost."? Is then issue that the time frame from 13:36 ton 13:39 is not included in your resample data? – itprorh66 Nov 13 '22 at 17:31
  • Your question can be improved by the addition of a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Nov 13 '22 at 17:33
  • Hi, thanks for the feedback! The data looks like this: datetime value 2022-01-03 13:36:00 8 2022-01-03 13:36:01 9 .... 2022-01-03 13:40:00 7 .... So the datetime is always a timestamp for the same day, starting from 13:36:00 in 1-second intervals, going until 15:54:00 or something (this differs a little for each individual dataframe). So yes, everything up until 13:39:59 is still filled with nans. After that, from 13:40:00, everything get resampled neatly. – Leni Nov 13 '22 at 22:23
  • (hope this makes it clearer - in my example, "8" would be the corresponding value for "2022-01-03 13:36:00" etc.) – Leni Nov 13 '22 at 22:37

0 Answers0