1

i have data in every 1 mintues , some time every 1 mintue data is missing .

To things to do :=

1- Resample this 1 mintues to 15 mintues .

2- if 15 mintues timestamp is not present then create 15 mintue timestamp and put the nearest data value present in dataframe. Data is below:-

Date & Time (UTC)   Sea level (m)   
01-05-2020 00:00    2.498   
01-05-2020 00:01    2.492   
01-05-2020 00:02    2.485   
01-05-2020 00:03    2.48    
01-05-2020 00:04    2.473   
01-05-2020 00:05    2.466   
01-05-2020 00:06    2.46    
01-05-2020 00:07    2.455   
01-05-2020 00:08    2.446   
01-05-2020 00:09    2.441   
01-05-2020 00:10    2.434   
01-05-2020 00:11    2.425   
01-05-2020 00:12    2.414   
01-05-2020 00:13    2.407   
01-05-2020 00:14    2.399   
01-05-2020 00:15    2.393   
01-05-2020 00:16    2.387   
01-05-2020 00:17    2.38    
01-05-2020 00:18    2.374   
01-05-2020 00:19    2.366   
01-05-2020 00:20    2.36    
01-05-2020 00:21    2.353   
01-05-2020 00:22    2.349   
01-05-2020 00:23    2.344   
01-05-2020 00:24    2.339   
01-05-2020 00:25    2.337   
01-05-2020 00:26    2.331   
01-05-2020 00:27    2.326   
01-05-2020 00:28    2.324   
01-05-2020 00:29    2.32    
01-05-2020 00:31    2.314   
01-05-2020 00:32    2.307   
01-05-2020 00:33    2.307   
01-05-2020 00:34    2.303   
01-05-2020 00:35    2.3 
01-05-2020 00:36    2.296   
01-05-2020 00:37    2.291   
01-05-2020 00:38    2.286   
01-05-2020 00:39    2.285   
01-05-2020 00:40    2.28    
01-05-2020 00:41    2.274   
01-05-2020 00:42    2.272   
01-05-2020 00:43    2.27    
01-05-2020 00:44    2.262   
01-05-2020 00:46    2.254   
01-05-2020 00:47    2.25    
01-05-2020 00:48    2.249   
01-05-2020 00:49    2.245   
01-05-2020 00:50    2.239   
01-05-2020 00:51    2.232   
01-05-2020 00:52    2.227   
01-05-2020 00:53    2.223   
01-05-2020 00:54    2.22    
01-05-2020 00:55    2.212   
01-05-2020 00:56    2.208   
01-05-2020 00:57    2.205   
01-05-2020 00:58    2.2 
01-05-2020 00:59    2.195   
01-05-2020 01:00    2.191   
01-05-2020 01:01    2.188   
01-05-2020 01:02    2.182   
01-05-2020 01:03    2.181   
01-05-2020 01:04    2.175   
01-05-2020 01:05    2.172   
01-05-2020 01:06    2.166   
01-05-2020 01:07    2.162   
01-05-2020 01:08    2.159   
01-05-2020 01:09    2.155   
01-05-2020 01:10    2.151   
01-05-2020 01:11    2.149   
01-05-2020 01:12    2.144   
01-05-2020 01:13    2.139   
01-05-2020 01:14    2.134   
01-05-2020 01:15    2.131   
01-05-2020 01:16    2.128   
01-05-2020 01:17    2.121   
01-05-2020 01:18    2.116   
01-05-2020 01:19    2.113   
01-05-2020 01:20    2.109   
01-05-2020 01:21    2.105   
01-05-2020 01:22    2.1 
01-05-2020 01:23    2.095   
01-05-2020 01:24    2.086   
01-05-2020 01:25    2.087   
01-05-2020 01:26    2.083   
01-05-2020 01:27    2.081   
01-05-2020 01:28    2.076   
01-05-2020 01:29    2.075   
01-05-2020 01:30    2.07    
01-05-2020 01:31    2.067   
01-05-2020 01:32    2.06    
01-05-2020 01:33    2.057   
01-05-2020 01:34    2.05    
01-05-2020 01:35    2.049   
01-05-2020 01:36    2.043   
01-05-2020 01:37    2.04    
01-05-2020 01:38    2.035   
01-05-2020 01:39    2.03    
01-05-2020 01:40    2.023   
01-05-2020 01:41    2.02    
01-05-2020 01:42    2.014   
01-05-2020 01:43    2.006   
01-05-2020 01:44    2.004       
01-05-2020 01:46    1.996   
01-05-2020 01:47    1.985   
01-05-2020 01:48    1.979   
01-05-2020 01:49    1.974   
01-05-2020 01:50    1.97    
01-05-2020 01:51    1.964   
01-05-2020 01:52    1.959   
01-05-2020 01:53    1.956   
01-05-2020 01:54    1.951   
01-05-2020 01:55    1.945   
01-05-2020 01:56    1.939   
01-05-2020 01:57    1.938   
01-05-2020 01:58    1.935   
01-05-2020 01:59    1.929   
01-05-2020 02:00    1.925   
01-05-2020 02:01    1.922   

Excepted output:-

timestamp   Sea level(m)
01-05-2020 00:15    2.393
01-05-2020 00:30    2.318
01-05-2020 00:45    2.262
01-05-2020 01:00    2.191
01-05-2020 01:15    2.131
01-05-2020 01:30    2.07
01-05-2020 01:45    1.996
01-05-2020 02:00    1.925

As we can see if we do every 15 mintues data resample then i wont be getting 01-05-2020 00:45 as it is missing in orginal dataframe so assign the value of 01-05-2020 00:45 to nearest present sea level(m) data. Thank you. my code didn't worked out.

    import pandas as pd
import  numpy as np
df=pd.read_csv("df2.csv",header=0)

df['timestamp']=pd.to_datetime(df['timestamp'])
resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='10s')
#dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
idx=pd.date_range(start='05-01-2020',end='05-30-2020',freq='15Min').strftime('%d-%m-%Y %H:%M')
df_resampled = df.combine_first(idx).interpolate(method='time', limit_direction = 'both', limit = None)
Ryan Ahmad
  • 39
  • 3
  • I will be happy to answer if you could provide the input data in a copy/pastable format and provide expected output data: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Jul 15 '20 at 01:32
  • @DavidErickson, data has been changed , question has been modified as required.kindly let me know.I will really appreciate ur help. thank you. – Ryan Ahmad Jul 15 '20 at 01:50
  • I have answered, but your output for what to do with rows that don't have a vlue that falls on the 15th minute interval is inconsistent, i.e. `2.318` in row 2 and `2.262` in row 3. Is my output your desired output? – David Erickson Jul 15 '20 at 02:41
  • please accept by clicking the "checkmark" next to my answer if it was helpful. – David Erickson Jul 28 '20 at 21:07

1 Answers1

1

pd.Grouper is better for summarizing rows by intervals.

df['Date & Time (UTC)'] = pd.to_datetime(df['Date & Time (UTC)'])
df = df.groupby(pd.Grouper(key='Date & Time (UTC)', freq='15min'))['Sea level (m)'].first().fillna(0).reset_index()
df

output:

    Date & Time (UTC)   Sea level (m)
0   2020-01-05 00:00:00 2.498
1   2020-01-05 00:15:00 2.393
2   2020-01-05 00:30:00 2.314
3   2020-01-05 00:45:00 2.254
4   2020-01-05 01:00:00 2.191
5   2020-01-05 01:15:00 2.131
6   2020-01-05 01:30:00 2.070
7   2020-01-05 01:45:00 1.996
8   2020-01-05 02:00:00 1.925
David Erickson
  • 16,433
  • 2
  • 19
  • 35
  • no problem @RyanAhmad , please accept if it helped you colve. – David Erickson Jul 15 '20 at 03:21
  • 1
    Hi, can u explain me this agg('first').fillna(0).reset_index() line please. or can provide any link where i can find useful resource. – Ryan Ahmad Jul 15 '20 at 09:42
  • @RyanAhmad no problem! `..agg('first')` takes the first value (by row #/index) for each group of 15 minute bins. So if there is a row for 29 minutes and 31 minutes, but nothing for 30 minutes, the value at 31 minutess would be the `first` value in that 15-minute group. If there was a value at 30 minutes, then that would be the first value. The syntax for `..agg('first')` is the same thing as `.min()` or `.sum()` if that's what you wanted to calculate. I actually tested you can just do `.first()` (updated answer). With `.agg`, you can do custom operations as well though. – David Erickson Jul 15 '20 at 10:10
  • The comment above was just about `.agg('first')`. In regards to `fillna(0)`, this is not required. I just did that to replace potentiona NaN values (which would happen if there was no data within a 15-minute timeframe). Finally, `.reset_index()` is required to turn it from a `series` into a `dataframe`. – David Erickson Jul 15 '20 at 10:15