1

Below is my DataFrame:

df = pd.DataFrame({'University':['4', '0', '3'],'Subject':['1', '2', '0']})

Just like 'University' & 'Subject' I have 50 other columns that end with the

column 'Score':['3', '1', '0'].

I would like to change the values for all 50 columns starting from the University all the way to Score. The condition should be if a value is => 1 then change them to 1 otherwise all other values should be zero.

I know that we are able to do it with this method:

df.loc[dt['University'] > 1] = 1

However, above I will have to type each and every one of my 50 columns. Is there a way where I could specifically mention starting from 'University' and ending at 'Score' which should include all the columns in between?

I am new to python so please try to be as clear as possible. Thank you.

Ammar Kamran
  • 139
  • 1
  • 1
  • 8
  • Why don't you just get `columns` with `df.columns`, filter out score column and iterate your method? – vladsiv Dec 06 '20 at 15:41
  • @VladSiv Please show me how that would work out. I am also not sure how to individually add columns in the same code. It only seems to work for 1 column at one time. – Ammar Kamran Dec 06 '20 at 15:45

3 Answers3

1

I think this could help you:

df = pd.DataFrame(
    {
        "University": ["4", "0", "3"],
        "Subject": ["1", "2", "0"],
        "Test1": ["1", "2", "0"],
        # Other columns
        "Score": ["3", "0", "4"],
    }
)

# Change the type, since you need int
df = df.astype(int)

for column in df.columns[:-1]:
    df.loc[df[column] > 1, column] = 1
print(df.head())

Which gives:

   University  Subject  Test1  Score
0           1        1      1      3
1           0        1      1      0
2           1        0      0      4

I believe that's what you want if I understood you correctly.

vladsiv
  • 2,718
  • 1
  • 11
  • 21
  • Thank you. Yes, it does solve it however, I need to do it for specific columns and not the entire Dataframe. Is there a way to do it starting from the University column and ending at the Score column, while including all the columns in the middle? – Ammar Kamran Dec 06 '20 at 16:54
  • @AmmarKamran `df.columns` returns a list of column names. If you want to exclude first and the last element you can pass `df.columns[1:-1]`. Please see [here](https://stackoverflow.com/questions/509211/understanding-slice-notation) for more information on slice notation. – vladsiv Dec 06 '20 at 17:54
1

This could work:

df_new = pd.DataFrame(np.where(df.values >= 1, 1, 0), columns=df.columns)

EDIT: This is not very clean but it might do the job:

df.loc[:, 'University':'Score'] = df.loc[:, 'University':'Score'].apply(lambda x: (x >= 1).astype('int'))

EDIT2: For more conditions I would do something like this, but again I don't think this is optimal:

def conditions(row):
    mask1 = (row.values >= 1) & (row.values < 5)
    mask2 = (row.values >= 5) & (row.values < 10)
    mask3 = (row.values >= 10)
    
    row = mask1.astype('int') + mask2.astype('int')*2 + mask3.astype('int')*3
    return row

df.loc[:, 'University':'Score'] = df.loc[:, 'University':'Score'].apply(conditions)

Apply is quite slow in general, and though it might not be an issue for your case, I would probably opt to select all columns of the df and drop the ones you don't want to include in the operation and use .loc (instead of trying to get the range of columns).

LazyEval
  • 769
  • 1
  • 8
  • 22
  • Thank you, however, this runs over my entire dataframe, I need to do it only for specific columns. My data frame also has other columns that have strings and Boolean values which is why I only need to do it for selected columns. The columns need to start from University and end at Score. – Ammar Kamran Dec 06 '20 at 16:50
  • Thank you once again the suggestion from the final edit works. Could you also please suggest if there is a way to to multiple suggestions. For eg values >10 should be 3 value >5 but < 10 should be 2 and values >=1 but < 5 should be 1. All other values should be 0. – Ammar Kamran Dec 06 '20 at 17:21
0

for any number of columns except the last one score

for i in df.columns[:-1]:
    df[i]=df[i].apply(lambda x: 1 if int(x)>=1 else 0)
Nour-Allah Hussein
  • 1,439
  • 1
  • 8
  • 17