1

I have a large data set, a sample is given below. The data is recorded for 1 day with 5-min interval for 24 hours for 3214 unique ids.

  • The time and date information is missing in some rows as shown in the image. Is there a way to automatically fill the missing date and time for every unique_id? Every unique id has 288 datetime values (5 min * 12 hr * 24 hr = 288 values per day with 5 min increment in time interval)
  • Also, how to fill the missing unqiue ids with previous value?
  • I am trying to fill the last column VALUE with empty cells.

I searched for the relevant answers 1, 2 but it did not work.

The most relent answer was here, but the solutions are only for missing dates not for both date and times.

sample_data.csv

datetime    Unique_ID   VALUE
28/01/2018 2:05 105714  284
28/01/2018 2:10 105714  292
28/01/2018 2:15 105714  287
28/01/2018 2:20 105714  193
28/01/2018 2:25 105714  2348
28/01/2018 2:30 105714  284
28/01/2018 2:35 105714  286
28/01/2018 2:40 105714  293
28/01/2018 2:45 105714  229
28/01/2018 2:50 105714  2292
28/01/2018 2:55 105714  225
28/01/2018 4:45 105714  185
28/01/2018 4:50 105714  277
28/01/2018 4:55 105714  185
28/01/2018 5:00 105714  2435
28/01/2018 5:05 105714  273
28/01/2018 5:10 105714  2360
28/01/2018 5:15 105714  282

enter image description here

Case Msee
  • 405
  • 5
  • 17

1 Answers1

1

Here in pandas you can try:

Firstly get the index of nan values:

idx=df[df.isna().all(1)].index

Then use ffill() method:

df[['Unique_ID','datetime']]=df[['Unique_ID','datetime']].ffill()

Finally you can use groupby() method and transform() method:

df['VALUE']=df.groupby(['Unique_ID','datetime'],dropna=False).transform('last')

Now if you want to increment 'datetime' column value by 5 mins which are filled by ffill() method then:

df.loc[idx,'datetime']=df.loc[idx,'datetime']+pd.to_timedelta(5,unit='m')
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • 2
    Updated my answer kindly have a look **:)** – Anurag Dabas Apr 24 '21 at 16:03
  • The above solution doesn,t worked for me as it depends on `nan` values. The data I am trying doesn,t have `nan` value or any blank spaces. Rather a missing information between the time intervals as shown here. https://i.stack.imgur.com/xNcb5.png – Case Msee Apr 25 '21 at 13:22
  • For instance, there are `22 values of datetime are missing` in the above shared image. I am trying for a solution which will automatically fill the `datetime` value with 5 min increment for 24 hrs (288 values in total for 24 hr). – Case Msee Apr 25 '21 at 13:24
  • next time pls be more specific and clear when asking question as no where in the question you are mentioning that missing is not refferring to `nan` but the values between the 2 dates – Anurag Dabas Apr 25 '21 at 13:42