5

I have this Dataframe:

V1 V2    
1  1    
0  0    
0  0    
0  0    
1  0    
1  1    
0  1

I need to compare each V1 and V2 row value with an variable called comienzo, and to set a third column named Label following this function:

def labeling(DataFrame):  
    comienzo=0 #flag to indicate event started (1= started, 0= no started)    
    for n in range(len(DataFrame)):
        if ((DataFrame.iloc[n]['V1']==1 & DataFrame.iloc[n]['V2']==1) & comienzo == 0 ) :
            comienzo=1
            DataFrame['Label']='Start'
        elif ((DataFrame.iloc[n]['V1']==0 & DataFrame.iloc[n]['V2']==0) & comienzo == 1 ) :
            comienzo=0
            DataFrame['Label']='End'
     return DataFrame

I want to do this pandorable using Dataframe.apply. So, I tried this:

def labeling(x, comienzo):  
    if ((x['V1']==1 & x['V2']==1) & comienzo == 0 ) :
        comienzo=1
        Label='Start'
    elif ((x['V1']==0 & x['V2']==0) & comienzo == 1 ) :
        comienzo=0
        Label='End'
    return Label

comienzo=0 #I initialize 'comienzo' var to 0
DataFrame['Label']=DataFrame.apply(labmda x: labeling(x,comienzo),axis=1)

This work but values are incorrect, I think that .apply doesn't take into account variable comienzo.

Is it possible make this code pandorable?

I want this output:

comienzo=0
V1 V2    
1 1 Start comienzo=1    
0 1 NaN    
0 0 End comienzo=0    
0 0 NaN    
1 0 NaN    
1 1 Start comienzo=1    
1 1 NaN    
0 1 NaN               
Parfait
  • 104,375
  • 17
  • 94
  • 125
Juan D
  • 59
  • 1
  • 3

1 Answers1

6

You have a series of small mistakes, ranging from improper lambda usage (mispelling lambda), not using apply properly with args (as noted above), and I'm pretty sure you meant to use 'and' instead of & in your conditional logic.

Also your input data is 7 rows, whereas your ideal output is 8 rows, which makes trying to help solve the problem by mapping input->output technically impossible.

However, I think this is what you are trying to get at:

DataFrame = pd.DataFrame(
        [[1,1],
         [0,1],
         [0,0],
         [0,0],
         [1,0],
         [1,1],
         [0,1]])
DataFrame.columns=['V1','V2']
DataFrame.insert(0, 'comienzo', 0)

def labeling(x):  
    global comienzo
    if ((x['V1']==1 and x['V2']==1) and comienzo == 0 ) :
        comienzo=1
        return('s')
    elif ((x['V1']==0 and x['V2']==0) and comienzo == 1 ) :
        comienzo=0
        return('end')

comienzo=0
DataFrame['Label']=DataFrame.apply(labeling,axis=1)

Note that by using a global for comienzo, we are able to preserve its value through apply-iterations.

Although in many cases using globals is bad practice. Further reading here: Why is it not possible to access other variables from inside the apply function in Python?

Simon
  • 333
  • 2
  • 8
  • What is the difference between 'and' operator and '&' operator? Thank you I know to call global variables now. Your answer was 100% Useful for me. – Juan D Sep 04 '17 at 00:59
  • This is a great explanation: https://stackoverflow.com/questions/22646463/difference-between-and-boolean-vs-bitwise-in-python-why-difference-i Basically 'and' is usually what you want, and & is for a more sophisticated comparison. PS: If you liked my answer please accept it so I get credit :) – Simon Sep 04 '17 at 01:19
  • I´m newbie here in stack overflow. How can I do to accept your answer? When I click uparrow vote system shows me: "Thanks for the feedback! Votes cast by those with less than 15 reputation are recorded, but do not change the publicly displayed post score." Of course, I liked your answer. Thank you. – Juan D Sep 06 '17 at 00:26
  • @JuanD To accept an answer, you must be the person who asked the question. Then, you click the "Checkmark" near the vote arrows, which means "This answer solved my problem". – Joseph Hansen Nov 05 '18 at 19:54