DataFrame column value not updated by 'replace'

Question

I have a dataset that contains records like :


@attribute pelvic_incidence numeric
@attribute pelvic_tilt numeric
@attribute lumbar_lordosis_angle numeric
@attribute sacral_slope numeric
@attribute pelvic_radius numeric
@attribute degree_spondylolisthesis numeric

@data
74.09473084,18.82372712,76.03215571,55.27100372,128.4057314,73.38821617,Abnormal
87.67908663,20.36561331,93.82241589,67.31347333,120.9448288,76.73062904,Abnormal
48.25991962,16.41746236,36.32913708,31.84245726,94.88233607,28.34379914,Abnormal
38.50527283,16.96429691,35.11281407,21.54097592,127.6328747,7.986683227,Normal
54.92085752,18.96842952,51.60145541,35.952428,125.8466462,2.001642472,Normal
44.36249017,8.945434892,46.90209626,35.41705528,129.220682,4.994195288,Normal
48.3189305,17.45212105,47.99999999,30.86680945,128.9803079,-0.910940567,Normal

I wish to create a `DataFrame' from the given dataset and then change the labels of the column named 'class', from 'Abnormal' to 0 and 'Normal' to 1 respectively. I did the following:

raw_data = loadarff('column_2C_weka.arff')
df = pd.DataFrame(raw_data[0])
df["class"].replace({"Abnormal": "0" , "Normal" : "1"},inplace = True)
print(df['class'])

Unfortunately, the column 'class' is not updating the values i.e. it still shows the same 'Abnormal' and 'Normal' data labels.

To be more sure about how the replace method works, I tried the same with a small DataFrame:

df = pd.DataFrame({"column1": ["a", "b", "a"]})
print(df)
df["column1"].replace({"a": "x", "b": "y"}, inplace=True)
print(df)

Surprisingly, it does change the values from a to x and b to y:

column1
0       a
1       b
2       a
  column1
0       x
1       y
2       x

I am baffled. Why is it not occurring with my dataset but gets replaced with this DataFrame?

Thanks in advance.

P. S : Something like this worked out for me

df['class'] = df['class'].astype(str).str.replace('Abnormal', '0')

I don't have a clue about how it got the desired output and not all the previous ones! Any help is appreciated.

With your sample, your last column is not loaded by `loadarff`. Do you use `loadarff` from `scipy`? — Corralien, Jan 25 '22 at 07:12
explanation is in [this](https://stackoverflow.com/a/59242208/2901002) answer. — jezrael, Jan 25 '22 at 07:13
@Corralien Yes, I wrote `from scipy.io.arff import loadarff`. But why isn't the last column loaded? — QUEEN, Jan 25 '22 at 07:14
@jezrael I removed the `inplace = True` part and also tried by removing the `inplace` method totally, but still, get the same output. — QUEEN, Jan 25 '22 at 07:17
If you want to replace `inplace`, you should not slice your dataframe: `df.replace({'class': {'Abnormal': '0' , 'Normal' : '1'}}, inplace=True)` — Corralien, Jan 25 '22 at 07:17
`df["column1"] = df["column1"].replace({"a": "x", "b": "y"})` not working? — jezrael, Jan 25 '22 at 07:17
@Corralien I tried your code snippet still the same problem persists! — QUEEN, Jan 25 '22 at 07:20
@jezrael I tried your code snippet still the same problem persists! — QUEEN, Jan 25 '22 at 07:20
@QUEEN - then there should be trailed whitespaces? Like `'a '` instead `'a'` — jezrael, Jan 25 '22 at 07:21
Try `df['class'] = df['class'].str.strip().replace({'Abnormal': '0' , 'Normal' : '1'})` — Corralien, Jan 25 '22 at 07:22
@Corralien I tried this .strip one but it showed the following error `TypeError: Cannot use .str.strip with values of inferred dtype 'bytes'. ` — QUEEN, Jan 25 '22 at 18:33

Corralien · Answer 1 · 2022-01-25T18:51:02.920

1

It seems your columns are bytes and not str, so use

df['class'] = df['class'].str.decode('utf-8').replace({'Abnormal': 0, 'Normal': 1})
print(df)

# Output
   pelvic_incidence  pelvic_tilt  lumbar_lordosis_angle  sacral_slope  pelvic_radius  degree_spondylolisthesis class
0         74.094731    18.823727              76.032156     55.271004     128.405731                 73.388216     0
1         87.679087    20.365613              93.822416     67.313473     120.944829                 76.730629     0
2         48.259920    16.417462              36.329137     31.842457      94.882336                 28.343799     0
3         38.505273    16.964297              35.112814     21.540976     127.632875                  7.986683     1
4         54.920858    18.968430              51.601455     35.952428     125.846646                  2.001642     1
5         44.362490     8.945435              46.902096     35.417055     129.220682                  4.994195     1
6         48.318931    17.452121              48.000000     30.866809     128.980308                 -0.910941     1

edited Jan 25 '22 at 18:51

answered Jan 25 '22 at 18:40

Corralien

109,409
8
28
52

@QUEEN. Does it solve your problem now? – Corralien Jan 25 '22 at 18:44
I tried this but it says `ValueError: Columns must be same length as key` Actually something like `df['class'] = df['class'].astype(str).str.replace('Abnormal', '0')` worked out for me but I don't know why. I am editing my question acordingly. – QUEEN Jan 25 '22 at 18:44
Is it possible for you to share your data? – Corralien Jan 25 '22 at 18:47
Thank you so much. FInally `df['class'] = df['class'].str.decode('utf-8').replace({'Abnormal': 0, 'Normal': 1}) ` worked! Any idea why this was successful? – QUEEN Jan 25 '22 at 18:47
I have attached some of the rows and columns in the question. The entire dataset is too large to be added. – QUEEN Jan 25 '22 at 18:48
I'm really surprised too! I don't know why. (don't forget to accept my answer :)) – Corralien Jan 25 '22 at 18:49
The problem with copy/paste is that we lose the encoding. – Corralien Jan 25 '22 at 18:49
Ohh, I wasn't aware. Which copy/paste are you talking about? How does the encoding change with copy/paste? – QUEEN Jan 25 '22 at 19:10
1

When you load your file with `loadarff` it seems text fields are converted to bytes because the file is probably opened in binary mode. – Corralien Jan 25 '22 at 19:14
Okay, I got it. Thank you :) – QUEEN Jan 25 '22 at 19:14

DataFrame column value not updated by 'replace'

1 Answers1