I'm new with python. I usually write in R language. And I try to Write simple loop into parallel in python. And I just stuck and can't do it. I will be glad if someone can help me.
I have sequence of events. In the end of those events the Y can be equal to 1 or to 0. I want to take that data frame and convert it to a long array.
Input:
ID EventDateTime Event Y
147 03:48.8 Completed 0
147 10:28.5 Completed 0
669 58:35.7 Login 1
669 58:48.9 Login 1
669 59:58.2 Login 1
669 00:09.8 Login 1
669 13:01.1 Login 1
669 13:09.5 Login 1
669 31:15.5 Login 1
669 49:19.2 Login 1
669 44:38.6 Login 1
669 44:48.3 Login 1
669 28:51.9 Login 1
669 29:19.4 Login 1
669 33:36.3 Login 1
53 45:05.8 Completed 0
68 02:03.6 LogOut 1
68 05:52.3 Completed 1
68 08:29.4 LogOut 1
Output:
Completed
Completed
Y=0
Login
Login
Login
Login
Login
Login
Login
Login
Login
Login
Login
Login
Login
Y=1
LogOut
Completed
LogOut
Y=1
Here is the script:
PreparationData = np.array([])
for ID in Data['ID'].unique():
Events=Data.loc[(Data['ID']==str(ID))]
Events=Events.sort_values(by='EventDateTime',ascending=True)
if(Events['Y'].unique()==0):
Event=np.array(Events[['Event']])
Event=np.append(Event, 'Y=0')
PreparationData=np.append(PreparationData,Event)
else:
Event=np.array(Events[['Event']])
Event=np.append(Event, 'Y=1')
PreparationData=np.append(PreparationData,Event)
If someone knows R and python. Here how I write what I need in R:
doSNOW::registerDoSNOW(cl<-snow::makeCluster(4))
PreparationData<-ddply(.data = Data,
.variables = "ID",
.parallel=TRUE,
.fun = function(Events){
Events<-Events[order(Events$EventDateTime),]
if(unique(Events$Y)<=0){
return(data.frame(Events=c(Events$Event,"Y=0")))
}else{
return(data.frame(Events=c(Events$Event,"Y=1")))
}
})
snow::stopCluster(cl)
How I write it in parallel with python? Thanks.