1

My code pulls a dataframe object and I'd like to mask the dataframe. If a value <= 15 then change value to 1 else change value to 0.

import pandas as pd
XTrain = pd.read_excel('C:\\blahblahblah.xlsx')

for each in XTrain:
  if each <= 15:
    each = 1
  else:
    each = 0

Im coming from VBA and .NET so I know it's not very pythonic, but it seems super easy to me... The code hits an error since it iterates through the df header. So I tried to check for type

for each in XTrain:
  if isinstance(each, str) is False:
    if each <= 15:
      each = 1
    else:
      each = 0

This time it got to the final header but did not progress into the dataframe. This makes me think I am not looping through thr dataframe correctly? Been stumped for hours, could anyone send me a little help?

Thank you!

St. Jimmy
  • 73
  • 8
  • 1
    Did you look into [How to iterate over rows in a DataFrame in Pandas](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) ? – niraj Nov 04 '20 at 18:26
  • Thank you, I saw that article and tried iterrow() but it didn't work (plus at the time I thought the issue was with type). I'll revisit this. Thanks. – St. Jimmy Nov 04 '20 at 18:31
  • np, probably some solutions below will work. – niraj Nov 04 '20 at 18:33
  • `itertuples()` will preserve data types, and `iterrows()` won't. Even better are the non-iteration approaches presented below; docs [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html) – jsmart Nov 04 '20 at 20:34

3 Answers3

3

for each in XTrain always loops through the column names only. That's how Pandas designs it to be.

Pandas allows comparison/ arithmetic operations with numbers directly. So you want:

 # le is less than or equal to
 XTrains.le(15).astype(int)

 # same as
 # (XTrain <= 15).astype(int)

If you really want to iterate (don't), remember that a dataframe is two dimensional. So something like this:

for index, row in df.iterrows():
    for cell in row:
        if cell <= 15:
            # do something
            # cell = 1 might not modify the cell in original dataframe
            # this is a python thing and you will get used to it
        else:
            # do something else
        
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
2

SetUp

df = pd.DataFrame({'A' : range(0, 20, 2), 'B' : list(range(10, 19)) + ['a']})
print(df)

    A   B
0   0  10
1   2  11
2   4  12
3   6  13
4   8  14
5  10  15
6  12  16
7  14  17
8  16  18
9  18   a

Solution : pd.to_numeric to avoid problems with str values and DataFrame.le

df.apply(lambda x: pd.to_numeric(x, errors='coerce')).le(15).astype(int)

Output

   A  B
0  1  1
1  1  1
2  1  1
3  1  1
4  1  1
5  1  1
6  1  0
7  1  0
8  0  0
9  0  0

If you want keep string values:

df2 = df.apply(lambda x: pd.to_numeric(x, errors='coerce'))
new_df = df2.where(lambda x: x.isna(), df2.le(15).astype(int)).fillna(df)
print(new_df)


   A  B
0  1  1
1  1  1
2  1  1
3  1  1
4  1  1
5  1  1
6  1  0
7  1  0
8  0  0
9  0  a
ansev
  • 30,322
  • 5
  • 17
  • 31
0

Use applymap to apply the function to each element of the dataframe and lambda to write the function.

df.applymap(lambda x: x if isinstance(each, str) else 1 if x <= 15 else 0)
Shradha
  • 2,232
  • 1
  • 14
  • 26