0

I have a column 'type' with the variables - normal, scalar, linear, perpendicular, tabular

I have to rewrite this column whereby all values that are 'normal' have to be replaced by -1, and everything else by 1. Input being as follows:

type:
normal
normal
tabular
scalar
normal
linear

and the expected output

type
-1
-1
1
1
-1
1

For this, I have tried:

data.loc[data['type'] != 'normal', 'type'] = -1

This one is just converting all attributes in the column to -1. Plus, when I do this:

data.loc[data['type'] == 'normal', 'type'] = 1

It's not having any effect and all attributes still remain -1 (both normal and others). Can someone help me with this.

Additionally, in another part of the project, I have to apply an undersampling technique to obtain a balanced set of data where all samples with value 'normal' shall belong to negative class, and all others to positive class. I'm kind of stuck with both these problems being similar in nature.

wakanada
  • 115
  • 1
  • 9
  • Can you include a small column example input `...` and its expected output `...` so that both can be copy pasted into a python interpreter. (Unless your question is similar to [Pandas DataFrame: replace all values in a column, based on condition](https://stackoverflow.com/q/31511997/5821790).) – Vepir Feb 14 '21 at 09:49
  • @Vepir the solution I tried (which is also included in the question) is the same as given in the link you've posted. I've added the column examples for reference – wakanada Feb 14 '21 at 09:53

3 Answers3

1

You could also transform the 'boolean' vector to your desired labels doing some clever math as a very quick one-liner:

import pandas as pd

data = {
    'type': ['normal', 'normal', 'tabular', 'scalar', 'normal', 'linear'],
    'random_other_column': range(6)
}

df = pd.DataFrame.from_dict(data)
# values as (0, 1)
# *2 in (0, 2)
# -1 in (-1, 1)
df['type'] = (df['type'] != 'normal') * 2 - 1
print(df)

output:

   type  random_other_column
0    -1                    0
1    -1                    1
2     1                    2
3     1                    3
4    -1                    4
5     1                    5
Stefan B
  • 1,617
  • 4
  • 15
0
import numpy as np 
import pandas as pd 

df = pd.DataFrame({
    'type':['normal', 'normal', 'tabular', 'scalar', 'normal', 'linear']
})

mask = df['type'] == 'normal'
df['type'][mask] = -1
df['type'][~mask] = 1

print(df)

#  type
#0   -1
#1   -1
#2    1
#3    1
#4   -1
#5    1
Naphat Amundsen
  • 1,519
  • 1
  • 6
  • 17
0

You can use a list comprehension to replace it:

df = pd.DataFrame({'type':['normal','normal','tabular','scalar','normal','linear']})
df

    type
0   normal
1   normal
2   tabular
3   scalar
4   normal
5   linear

df['type'] = [1 if i == 'normal' else -1 for i in df.type]
df


type
0   -1
1   -1
2   -1
3   -1
4   -1
5   -1
StupidWolf
  • 45,075
  • 17
  • 40
  • 72