1

I am iterating over a large dataframe using df.iterrows or df.itertuples. I am following the example that has been asked in the following link:

Here's [a link] (Python Pandas iterate over rows and access column names)


import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD'))
print df
          A         B         C         D
0  0.351741  0.186022  0.238705  0.081457
1  0.950817  0.665594  0.671151  0.730102
2  0.727996  0.442725  0.658816  0.003515
3  0.155604  0.567044  0.943466  0.666576
4  0.056922  0.751562  0.135624  0.597252
5  0.577770  0.995546  0.984923  0.123392
6  0.121061  0.490894  0.134702  0.358296
7  0.895856  0.617628  0.722529  0.794110
8  0.611006  0.328815  0.395859  0.507364
9  0.616169  0.527488  0.186614  0.278792

From the above dataframe, I trying to make a reference to a specific column and row (for instance the previous row), but I am getting errors. For example:

for row in df.iterrows():
    if row.loc[1,'A'] > 0.95:
       temp_val = row.loc[0,'A']
    else: 
       temp_val = row.loc[0,'B']

Thanks!

Alex Man
  • 457
  • 4
  • 19
  • 2
    do you mean `m=df.loc[1,'A']>0.95 ` and `np.where(m,df.loc[0,'A'],df.loc[0,'B'])` – anky May 12 '19 at 17:22

1 Answers1

1

You can do this much more efficiently using np.where and DataFrame.shift:

import numpy as np
np.where(df['A'].gt(0.95), df['A'].shift(), df['B'].shift())

The problem with your code is that df.iterrows() is returning a tuple, where the first element is the index, and the second a Series, so you can't directly index it. Here's a way you could do it:

df['result'] = np.nan
for ix, row in df.loc[1:,:].iterrows():
    if row.loc['A'] > 0.95:
        df.loc[ix, 'result'] = df.loc[ix-1,'A']
    else: 
        df.loc[ix, 'result'] = df.loc[ix-1,'B']
yatu
  • 86,083
  • 12
  • 84
  • 139