Iterate through data frame

Question

My code pulls a dataframe object and I'd like to mask the dataframe. If a value <= 15 then change value to 1 else change value to 0.

import pandas as pd
XTrain = pd.read_excel('C:\\blahblahblah.xlsx')

for each in XTrain:
  if each <= 15:
    each = 1
  else:
    each = 0

Im coming from VBA and .NET so I know it's not very pythonic, but it seems super easy to me... The code hits an error since it iterates through the df header. So I tried to check for type

for each in XTrain:
  if isinstance(each, str) is False:
    if each <= 15:
      each = 1
    else:
      each = 0

This time it got to the final header but did not progress into the dataframe. This makes me think I am not looping through thr dataframe correctly? Been stumped for hours, could anyone send me a little help?

Thank you!

Did you look into [How to iterate over rows in a DataFrame in Pandas](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) ? — niraj, Nov 04 '20 at 18:26
Thank you, I saw that article and tried iterrow() but it didn't work (plus at the time I thought the issue was with type). I'll revisit this. Thanks. — St. Jimmy, Nov 04 '20 at 18:31
`itertuples()` will preserve data types, and `iterrows()` won't. Even better are the non-iteration approaches presented below; docs [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html) — jsmart, Nov 04 '20 at 20:34

Quang Hoang · Answer 1 · 2020-11-04T18:37:26.603

for each in XTrain always loops through the column names only. That's how Pandas designs it to be.

Pandas allows comparison/ arithmetic operations with numbers directly. So you want:

 # le is less than or equal to
 XTrains.le(15).astype(int)

 # same as
 # (XTrain <= 15).astype(int)

If you really want to iterate (don't), remember that a dataframe is two dimensional. So something like this:

for index, row in df.iterrows():
    for cell in row:
        if cell <= 15:
            # do something
            # cell = 1 might not modify the cell in original dataframe
            # this is a python thing and you will get used to it
        else:
            # do something else

ansev · Answer 2 · 2020-11-04T18:43:37.660

SetUp

df = pd.DataFrame({'A' : range(0, 20, 2), 'B' : list(range(10, 19)) + ['a']})
print(df)

    A   B
0   0  10
1   2  11
2   4  12
3   6  13
4   8  14
5  10  15
6  12  16
7  14  17
8  16  18
9  18   a

Solution : pd.to_numeric to avoid problems with str values and DataFrame.le

df.apply(lambda x: pd.to_numeric(x, errors='coerce')).le(15).astype(int)

Output

If you want keep string values:

df2 = df.apply(lambda x: pd.to_numeric(x, errors='coerce'))
new_df = df2.where(lambda x: x.isna(), df2.le(15).astype(int)).fillna(df)
print(new_df)


   A  B
0  1  1
1  1  1
2  1  1
3  1  1
4  1  1
5  1  1
6  1  0
7  1  0
8  0  0
9  0  a

score 0 · Answer 3 · answered Nov 04 '20 at 18:31

0

Use applymap to apply the function to each element of the dataframe and lambda to write the function.

df.applymap(lambda x: x if isinstance(each, str) else 1 if x <= 15 else 0)

answered Nov 04 '20 at 18:31

Shradha

2,232
1
14
26

Iterate through data frame

3 Answers3