Use get_dummies in categorical data

Question

I have to use a dataset then use decision tree classifier, for that I can't have categorical data, but this dataset has columns with categorical data like this:

I know it can be done by using get_dummies function but I couldn't do it. I've firstly read the dataset like this:

def load_data(fname):
    """Load CSV file"""
    df = pd.read_csv(fname)
    nc = df.shape[1]
    matrix = df.values
    table_X = matrix [:, 2:]
    table_y = matrix [:, 81]
    features_names = df.columns.values[1:]
    target = df.columns.values[81]
    return table_X, table_y

table_X, table_y = load_data("dataset.csv")

pd.get_dummies(table_X)

when I run this I get this exception: Exception: Data must be 1-dimensional

What am I doing wrong?

------------------------------- EDIT ------------------------------------

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
y = le.fit_transform(table_y)
le.classes_

le.transform(['<200000', '>400000', '[200000,400000]'])

To apply the decision tree algorithm:

from sklearn import tree

dtc_Gini = tree.DecisionTreeClassifier() #criterion='gini'
dtc_Gini1 = dtc_Gini.fit(table_X, y)

ValueError: could not convert string to float: 'RL'

score 0 · Answer 1 · answered Jun 11 '20 at 14:46

0

just after pd.read_csv use pd.get_dummies(df)

answered Jun 11 '20 at 14:46

The thing is, I don't want to apply it to all dataset – Jun 11 '20 at 15:15
I have one column that has categorical numerical values, that I wanted to tranform in strings, and then I have ordinal numerical columns that should stay as they are, and transform them with index mapping. And lastly use get dummies – Jun 11 '20 at 15:19
1

oh then use : data = pd.get_dummies(data , columns = [""] ) – Jun 11 '20 at 17:53

score 0 · Accepted Answer · answered Jun 11 '20 at 15:28

0

Based on this answer: get_dummies(), Exception: Data must be 1-dimensional It seems like you have to convert back to dataframe your table_X before apply the function get_dummies(). Or you can avoid to use df.values.

Try this:

def load_data(fname):
    """Load CSV file"""
    df = pd.read_csv(fname)
    table_X = df.iloc[:, 2:]
    table_y = df.iloc[:, 81]
    return table_X, table_y

table_X, table_y = load_data("dataset.csv")

pd.get_dummies(table_X)

And let me know if it works.

answered Jun 11 '20 at 15:28

DavideBrex

2,374
1
10
23

it works, at least I get a table when I run ´pd.get_dummies(table_X)´ Although when I try to run the decision tree algorithm I still get an error – Jun 11 '20 at 15:37
I have edited the question and added the decision tree part – Jun 11 '20 at 15:39
1

The error you get is probably related to values that are not float in table_x (there are still strings in it). Check this with `table_x.dtypes` – DavideBrex Jun 11 '20 at 15:45
how is it possible to change a column type? – Jun 11 '20 at 16:34
To apply the decision tree algorithm I need to take out all NaN, infinity or values too large for dtype(float32), because I applied get_dummies to table_X and I got this error: ´ValueError: Input contains NaN, infinity or a value too large for dtype('float32').´ – Jun 11 '20 at 16:40
So you get the error when you use get_dummies(), or after? To fix the Nan see this answer: [here](https://datascience.stackexchange.com/a/11933) – DavideBrex Jun 11 '20 at 16:43
The error of NaN etc, I see after doing `data = pd.get_dummies(table_X) dtc_Gini = tree.DecisionTreeClassifier() #criterion='gini' dtc_Gini = dtc_Gini.fit(data.values, y)` – Jun 11 '20 at 16:46
You should tell me at which exact line you get the error, otherwise it's difficult to understand your problem. If you get the error in the line of get_dummies then before apply that function you should fill the Nan values and then use get_dummies(). – DavideBrex Jun 11 '20 at 17:12
I get the error when I am applying the fit option. I use get dummies before and store the output in the variable data and the call data.values and y (which is in the message post above) and that's when I get the error – Jun 11 '20 at 17:14
Ok then there are Nan values in your data table. Fix that following the link i sent in the comment above – DavideBrex Jun 11 '20 at 17:16
Actually, I get the could not convert string to float too – Jun 11 '20 at 17:16
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/215763/discussion-between-ack31-and-davidebrex). – Jun 11 '20 at 17:19

Use get_dummies in categorical data

2 Answers2