1

I am trying to group datetime column based on interval. The grouping time interval can be either 5 or 10 or 15 or something else. After grouping datetime column the latest or the highest datetime of the group I need to capture as record. Then I need to merge the dictionary columns with the recent key-value pair. Something like this---

-------------------------------------------------------------------
created_date                 |  parameter_value  |  device_info   |
-------------------------------------------------------------------
2020-09-23 06:21:00.114104   | {'humidity': 60}    | {'device':'env', 'battery':50}
2020-09-23 06:37:00.310054   | {"pressure":54}     | {'device':'env', 'battery': 49}
2020-09-23 08:00:00.126441   | {'humidity':34}     | {'device':'env', 'battery': 45}
2020-09-23 08:21:00.140708   | {'temperature': 35} | {'device':'env', 'battery': 40}
2020-09-23 08:25:00.099155   | {'temperature':24}  | {'device':'env', 'battery': 39}

I want to achieve the response something like---

--------------------------------------------------------------------------------------------------
created_date        |  parameter_value                   |  device_info                 |
--------------------------------------------------------------------------------------------------
2020-09-23 06:40:00 | {'humidity': 60, "pressure":54}    | {'device':'env', 'battery':49}
2020-09-23 08:00:00 | {'humidity':34}                    | {'device':'env', 'battery': 45}
2020-09-23 08:40:00 | {'temperature':24}                 | {'device':'env', 'battery':39}

I have tried to group the datetime using df.groupby(pd.Grouper(key='created_date', freq='20min')) but how to merge the parameter_value column and device_info column using aggregate function with latest key-value.

Rahul Kumar
  • 119
  • 1
  • 8

1 Answers1

3

Use GroupBy.agg with lambda function and merge dictionaries, here are used solution from this. lst remove rows with only empty dictionaries by convert them to boolean and filter with DataFrame.any:

df = (df.groupby(pd.Grouper(key='created_date', freq='20min'))
       .agg(lambda x: {k: v for d in x for k, v in d.items()}))
df = df[df.astype(bool).any(axis=1)]
print (df)
                                      parameter_value  \
created_date                                            
2020-09-23 06:20:00  {'humidity': 60, 'pressure': 54}   
2020-09-23 08:00:00                  {'humidity': 34}   
2020-09-23 08:20:00               {'temperature': 24}   

                                          device_info  
created_date                                           
2020-09-23 06:20:00  {'device': 'env', 'battery': 49}  
2020-09-23 08:00:00  {'device': 'env', 'battery': 45}  
2020-09-23 08:20:00  {'device': 'env', 'battery': 39} 
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252