iterate among two columns of a dataframe

Question

I am trying to iterate among two columns of a dataframe ("binS99", 'bin3HMax'). Those columns have values from 0 to 4. then I would like to create a new column ('Probability') in the same dataframe ("df_selection") taking the values from the matrix "Prob". The following code goes into a loop. any ideas on how to solve? thank you

 prob =  [[0,   0.00103,    0.00103],
         [0,    0.00267,    0.00311],
         [0,    0.00688,    0.01000],
         [0,    0.01777,    0.03218]] 

for index, row, in df_selection.iterrows():
    a = int(df_selection.loc[index,"binS99"]) #int(str(row["binS99"]))
    b = int(df_selection.loc[index,"bin3HMax"]) #int(str(row["bin3HMax"]))
   
    df_selection.loc[index,"Probability"]= prob[a][b]

'''

Iterating a python loop over dataframes are highly discouraged. For why see https://stackoverflow.com/a/55557758/8479618. You should almost always use built in operations as they are optimized under-the-hood. Would you mind posting what your df looks like instead? I see a potential solution utilizing dictionaries and vectorizing the operation — Jeff, Aug 31 '20 at 06:54
Welcome to Stackoverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. — jezrael, Aug 31 '20 at 07:13
HI, I see. The dataframe has 17 columns and 22985 rows. I just need the two columns cited above. — Luca Piciullo, Aug 31 '20 at 07:35

score 0 · Accepted Answer · answered Aug 31 '20 at 07:18

I believe you need first check if maximal values in columns matched maximal number of values in lists and then use numpy indexing:

df_selection = pd.DataFrame({
        'A':list('abcdef'),
         'binS99':[0,1,2,0,2,1],
         'bin3HMax':[1,2,1,0,1,0],

})
print (df_selection)
   A  binS99  bin3HMax
0  a       0         1
1  b       1         2
2  c       2         1
3  d       0         0
4  e       2         1
5  f       1         0

prob =  [[0,   0.00103,    0.00103],
         [0,    0.00267,    0.00311],
         [0,    0.00688,    0.01000],
         [0,    0.01777,    0.03218]]

arr_prob = np.array(prob)
print (arr_prob)
[[0.      0.00103 0.00103]
 [0.      0.00267 0.00311]
 [0.      0.00688 0.01   ]
 [0.      0.01777 0.03218]]

a = df_selection['binS99'].to_numpy()
b = df_selection['bin3HMax'].to_numpy()

df_selection['Probability'] = arr_prob[a, b]
print (df_selection)
   A  binS99  bin3HMax  Probability
0  a       0         1      0.00103
1  b       1         2      0.00311
2  c       2         1      0.00688
3  d       0         0      0.00000
4  e       2         1      0.00688
5  f       1         0      0.00000

This is the reply I get: return object.__getattribute__(self, name) AttributeError: 'Series' object has no attribute 'to_numpy' — Luca Piciullo, Aug 31 '20 at 07:49
@LucaPiciullo - Change `.to_numpy()` to `.values` (without `()`) or upgrade pandas — jezrael, Aug 31 '20 at 07:49

iterate among two columns of a dataframe

1 Answers1