1

im quite new to pandas and wondered if anyone could help with the below;

I'm trying to use pandas to loop row by row over a dataframe, and for each row I want to compare that row to every row of another dataframe (which is roughly of len 7).

The dataframe is quite sizeable (index is 10min freq from may to october) and the nested for loop I have in place takes an age to run (about 20mins);

    frame['Group1 ON With Exception'] = ''
    for i in range(len(frame)):
        for j in range(len(grp1_extpn_tbl)):
            if ((frame.ix[i,'T01\n(kWh) ':'T22\n(kWh) ']>1) == (grp1_extpn_tbl.loc[j]>0)).all():    
                frame.ix[i,'Group1 ON With Exception'] = ''
                break
            else:            
                frame.ix[i,'Group1 ON With Exception'] = 'NOT VALID GROUP1 DATA'

Obviously with pandas the key is to avoid looping and so I have come up with using nested np.where's, which considerably speeds things up (something like 3mins). The issue is it looks quite a cumbersome block of code and I wondered if there was another alternative, or even to condense this block of code more than it is? ;

    frame['Group1 ON With Exception'] = ''
    frame['Group1 ON With Exception'] =  np.where((frame.loc[:,'T01\n(kWh) ':'T22\n(kWh) ']).apply(lambda x: ((x>1) == (grp1_extpn_tbl.loc[0] > 0)).all(), axis=1),'',
                                         np.where((frame.loc[:,'T01\n(kWh) ':'T22\n(kWh) ']).apply(lambda x: ((x>1) == (grp1_extpn_tbl.loc[1] > 0)).all(), axis=1),'',
                                         np.where((frame.loc[:,'T01\n(kWh) ':'T22\n(kWh) ']).apply(lambda x: ((x>1) == (grp1_extpn_tbl.loc[2] > 0)).all(), axis=1),'',
                                         np.where((frame.loc[:,'T01\n(kWh) ':'T22\n(kWh) ']).apply(lambda x: ((x>1) == (grp1_extpn_tbl.loc[3] > 0)).all(), axis=1),'',
                                         np.where((frame.loc[:,'T01\n(kWh) ':'T22\n(kWh) ']).apply(lambda x: ((x>1) == (grp1_extpn_tbl.loc[4] > 0)).all(), axis=1),'',
                                         np.where((frame.loc[:,'T01\n(kWh) ':'T22\n(kWh) ']).apply(lambda x: ((x>1) == (grp1_extpn_tbl.loc[5] > 0)).all(), axis=1),'',
                                         np.where((frame.loc[:,'T01\n(kWh) ':'T22\n(kWh) ']).apply(lambda x: ((x>1) == (grp1_extpn_tbl.loc[6] > 0)).all(), axis=1),'','NOT VALID GROUP1 DATA')))))))

Hopefully the above is enough information, any help would be greatly appreciated.

Thanks,

JayBe
  • 75
  • 1
  • 1
  • 5
  • 1
    Please check [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and try create [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) with desired output. – jezrael Dec 05 '16 at 12:31

1 Answers1

0

Instead of using i,j to run over the dataframe, use iterrows() instead.

for index, row in df1.iterrows():
    if row['column'] in df2['column_to_compare_to']:
        do_something()
    else:
        do_something_else()

Alternatively, why not loop through the small dataframe of len 7 and select from the big dataframe? If the select turns up empty, then do something, if the select returns a result, do something with that. Hope that helps!

Eric
  • 89
  • 5