I'm trying to apply the following two imputs function in a aggregation statement, but I'm getting an unhashable type: 'list' TypeError:
from datetime import datetime
def process_difftime(x, y, start, final):
t1 = []
t2 = []
for i in x.index:
if x[i] == start:
t1.append(y[i])
elif x[i] == final:
t2.append(y[i])
res = round((max(t2) - max(t1)).total_seconds()/3600, 2)
return res
List0 = pd.Series(['A10000','A10000','A10001','A10001'], index=[2,3,4,5])
List1 = pd.Series(['A_Create','A_Accepted','A_Create','A_Accepted'], index=[2,3,4,5])
List2 = pd.Series(['2016-08-03 15:57:21','2016-08-03 16:57:21','2016-08-03 15:57:21','2016-08-03 19:57:21'], index=[2,3,4,5])
List2 = pd.Series([datetime.strptime(x,'%Y-%m-%d %H:%M:%S') for x in List2], index=[2,3,4,5])
df = pd.DataFrame({
'code':List0,
'instance':List1,
'timestamp':List2
})
df.groupby(['code']) \
.agg(
a_concept_difftime = (['instance','timestamp'], lambda x,y: process_difftime(x,y,'A_Create','A_Accepted'))
)
Any suggestion?
Desired output
code a_concept_difftime
A10000 1.0
A10000 4.0
Additional details: I'm working with a large log events dataset that corresponds to the execution of a semi standardized process, there is about 60 different instances (stages of the process) and 3 different timestamps (schedule, start, complete). The goal of the function is to select a instance column and a timestamp type to calculate the difference in hours between two instances (the combination could change).