-1

I have a dataset that contains records like :


@attribute pelvic_incidence numeric
@attribute pelvic_tilt numeric
@attribute lumbar_lordosis_angle numeric
@attribute sacral_slope numeric
@attribute pelvic_radius numeric
@attribute degree_spondylolisthesis numeric

@data
74.09473084,18.82372712,76.03215571,55.27100372,128.4057314,73.38821617,Abnormal
87.67908663,20.36561331,93.82241589,67.31347333,120.9448288,76.73062904,Abnormal
48.25991962,16.41746236,36.32913708,31.84245726,94.88233607,28.34379914,Abnormal
38.50527283,16.96429691,35.11281407,21.54097592,127.6328747,7.986683227,Normal
54.92085752,18.96842952,51.60145541,35.952428,125.8466462,2.001642472,Normal
44.36249017,8.945434892,46.90209626,35.41705528,129.220682,4.994195288,Normal
48.3189305,17.45212105,47.99999999,30.86680945,128.9803079,-0.910940567,Normal

I wish to create a `DataFrame' from the given dataset and then change the labels of the column named 'class', from 'Abnormal' to 0 and 'Normal' to 1 respectively. I did the following:

raw_data = loadarff('column_2C_weka.arff')
df = pd.DataFrame(raw_data[0])
df["class"].replace({"Abnormal": "0" , "Normal" : "1"},inplace = True)
print(df['class'])

Unfortunately, the column 'class' is not updating the values i.e. it still shows the same 'Abnormal' and 'Normal' data labels.

To be more sure about how the replace method works, I tried the same with a small DataFrame:

df = pd.DataFrame({"column1": ["a", "b", "a"]})
print(df)
df["column1"].replace({"a": "x", "b": "y"}, inplace=True)
print(df)

Surprisingly, it does change the values from a to x and b to y:

column1
0       a
1       b
2       a
  column1
0       x
1       y
2       x

I am baffled. Why is it not occurring with my dataset but gets replaced with this DataFrame?

Thanks in advance.

P. S : Something like this worked out for me

df['class'] = df['class'].astype(str).str.replace('Abnormal', '0')

I don't have a clue about how it got the desired output and not all the previous ones! Any help is appreciated.

QUEEN
  • 383
  • 1
  • 5
  • 13

1 Answers1

1

It seems your columns are bytes and not str, so use

df['class'] = df['class'].str.decode('utf-8').replace({'Abnormal': 0, 'Normal': 1})
print(df)

# Output
   pelvic_incidence  pelvic_tilt  lumbar_lordosis_angle  sacral_slope  pelvic_radius  degree_spondylolisthesis class
0         74.094731    18.823727              76.032156     55.271004     128.405731                 73.388216     0
1         87.679087    20.365613              93.822416     67.313473     120.944829                 76.730629     0
2         48.259920    16.417462              36.329137     31.842457      94.882336                 28.343799     0
3         38.505273    16.964297              35.112814     21.540976     127.632875                  7.986683     1
4         54.920858    18.968430              51.601455     35.952428     125.846646                  2.001642     1
5         44.362490     8.945435              46.902096     35.417055     129.220682                  4.994195     1
6         48.318931    17.452121              48.000000     30.866809     128.980308                 -0.910941     1
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • @QUEEN. Does it solve your problem now? – Corralien Jan 25 '22 at 18:44
  • I tried this but it says `ValueError: Columns must be same length as key` Actually something like `df['class'] = df['class'].astype(str).str.replace('Abnormal', '0')` worked out for me but I don't know why. I am editing my question acordingly. – QUEEN Jan 25 '22 at 18:44
  • Is it possible for you to share your data? – Corralien Jan 25 '22 at 18:47
  • Thank you so much. FInally `df['class'] = df['class'].str.decode('utf-8').replace({'Abnormal': 0, 'Normal': 1}) ` worked! Any idea why this was successful? – QUEEN Jan 25 '22 at 18:47
  • I have attached some of the rows and columns in the question. The entire dataset is too large to be added. – QUEEN Jan 25 '22 at 18:48
  • I'm really surprised too! I don't know why. (don't forget to accept my answer :)) – Corralien Jan 25 '22 at 18:49
  • The problem with copy/paste is that we lose the encoding. – Corralien Jan 25 '22 at 18:49
  • Ohh, I wasn't aware. Which copy/paste are you talking about? How does the encoding change with copy/paste? – QUEEN Jan 25 '22 at 19:10
  • 1
    When you load your file with `loadarff` it seems text fields are converted to bytes because the file is probably opened in binary mode. – Corralien Jan 25 '22 at 19:14
  • Okay, I got it. Thank you :) – QUEEN Jan 25 '22 at 19:14