how to treat column like a binary

Question

I have a column 'type' with the variables - normal, scalar, linear, perpendicular, tabular

I have to rewrite this column whereby all values that are 'normal' have to be replaced by -1, and everything else by 1. Input being as follows:

type:
normal
normal
tabular
scalar
normal
linear

and the expected output

type
-1
-1
1
1
-1
1

For this, I have tried:

data.loc[data['type'] != 'normal', 'type'] = -1

This one is just converting all attributes in the column to -1. Plus, when I do this:

data.loc[data['type'] == 'normal', 'type'] = 1

It's not having any effect and all attributes still remain -1 (both normal and others). Can someone help me with this.

Additionally, in another part of the project, I have to apply an undersampling technique to obtain a balanced set of data where all samples with value 'normal' shall belong to negative class, and all others to positive class. I'm kind of stuck with both these problems being similar in nature.

Can you include a small column example input `...` and its expected output `...` so that both can be copy pasted into a python interpreter. (Unless your question is similar to [Pandas DataFrame: replace all values in a column, based on condition](https://stackoverflow.com/q/31511997/5821790).) — Vepir, Feb 14 '21 at 09:49
@Vepir the solution I tried (which is also included in the question) is the same as given in the link you've posted. I've added the column examples for reference — wakanada, Feb 14 '21 at 09:53

score 1 · Answer 1 · answered Feb 14 '21 at 10:09

You could also transform the 'boolean' vector to your desired labels doing some clever math as a very quick one-liner:

import pandas as pd

data = {
    'type': ['normal', 'normal', 'tabular', 'scalar', 'normal', 'linear'],
    'random_other_column': range(6)
}

df = pd.DataFrame.from_dict(data)
# values as (0, 1)
# *2 in (0, 2)
# -1 in (-1, 1)
df['type'] = (df['type'] != 'normal') * 2 - 1
print(df)

output:

   type  random_other_column
0    -1                    0
1    -1                    1
2     1                    2
3     1                    3
4    -1                    4
5     1                    5

Naphat Amundsen · Answer 2 · 2021-02-14T11:20:22.700

0

import numpy as np 
import pandas as pd 

df = pd.DataFrame({
    'type':['normal', 'normal', 'tabular', 'scalar', 'normal', 'linear']
})

mask = df['type'] == 'normal'
df['type'][mask] = -1
df['type'][~mask] = 1

print(df)

#  type
#0   -1
#1   -1
#2    1
#3    1
#4   -1
#5    1

edited Feb 14 '21 at 11:20

answered Feb 14 '21 at 09:58

Naphat Amundsen

1,519
1
6
17

Does not work with multiple columns. The entire dataframe gets assigned `-1` or `1` – Stefan B Feb 14 '21 at 10:20
@StefanB Oops, I forgot to do df['type'] first, I have edited it now – Naphat Amundsen Feb 14 '21 at 11:23
This method is giving error: A value is trying to be set on a copy of a slice from a DataFrame – wakanada Feb 14 '21 at 11:56
@wakanada Hmm, weird. It works fine with me with Python 3.8.5, NumPy 1.19.0, pandas 1.0.5 – Naphat Amundsen Feb 14 '21 at 12:17

score 0 · Answer 3 · answered Feb 14 '21 at 10:22

You can use a list comprehension to replace it:

df = pd.DataFrame({'type':['normal','normal','tabular','scalar','normal','linear']})
df

    type
0   normal
1   normal
2   tabular
3   scalar
4   normal
5   linear

df['type'] = [1 if i == 'normal' else -1 for i in df.type]
df


type
0   -1
1   -1
2   -1
3   -1
4   -1
5   -1

how to treat column like a binary

3 Answers3