0

I'm new with python. I usually write in R language. And I try to Write simple loop into parallel in python. And I just stuck and can't do it. I will be glad if someone can help me.

I have sequence of events. In the end of those events the Y can be equal to 1 or to 0. I want to take that data frame and convert it to a long array.

Input:

ID  EventDateTime   Event       Y
147 03:48.8         Completed   0
147 10:28.5         Completed   0
669 58:35.7         Login       1
669 58:48.9         Login       1
669 59:58.2         Login       1
669 00:09.8         Login       1
669 13:01.1         Login       1
669 13:09.5         Login       1
669 31:15.5         Login       1
669 49:19.2         Login       1
669 44:38.6         Login       1
669 44:48.3         Login       1
669 28:51.9         Login       1
669 29:19.4         Login       1
669 33:36.3         Login       1
53  45:05.8         Completed   0
68  02:03.6         LogOut      1
68  05:52.3         Completed   1
68  08:29.4         LogOut      1

Output:

Completed
Completed
Y=0
Login
Login
Login
Login
Login
Login
Login
Login
Login
Login
Login
Login
Login
Y=1
LogOut
Completed
LogOut
Y=1

Here is the script:

PreparationData = np.array([])
for ID in Data['ID'].unique():
    Events=Data.loc[(Data['ID']==str(ID))]
    Events=Events.sort_values(by='EventDateTime',ascending=True)
    if(Events['Y'].unique()==0):
        Event=np.array(Events[['Event']])
        Event=np.append(Event, 'Y=0')
        PreparationData=np.append(PreparationData,Event)
    else:
        Event=np.array(Events[['Event']])
        Event=np.append(Event, 'Y=1')
        PreparationData=np.append(PreparationData,Event)

If someone knows R and python. Here how I write what I need in R:

doSNOW::registerDoSNOW(cl<-snow::makeCluster(4))
PreparationData<-ddply(.data = Data,
                       .variables = "ID",
                       .parallel=TRUE,
                       .fun = function(Events){

                         Events<-Events[order(Events$EventDateTime),]
                         if(unique(Events$Y)<=0){
                           return(data.frame(Events=c(Events$Event,"Y=0")))
                         }else{
                           return(data.frame(Events=c(Events$Event,"Y=1")))
                         }

                       })
snow::stopCluster(cl)

How I write it in parallel with python? Thanks.

dmitriy
  • 253
  • 1
  • 17
  • post a testable dataframe as code not as image – RomanPerekhrest Aug 25 '19 at 09:29
  • @dmitriy: Add a `list of tuple` or `list of dict` or `csv` from your `DATA` **as text** to your Question. – stovfl Aug 25 '19 at 10:13
  • @RomanPerekhrest I converted it to text – dmitriy Aug 25 '19 at 10:31
  • @dmitriy: ***"converted it to text"***: Better, but allready not [mcve]. Still missing `import ...` and `DATA = ???()`. Also read [multiprocessing-vs-threading-python](https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python) then [edit] your Question to clarify your statement ***"parallel in python"***. – stovfl Aug 25 '19 at 11:33

0 Answers0