get_dummies(), Exception: Data must be 1-dimensional

Question

I have this data

I am trying to apply this:

one_hot = pd.get_dummies(df)

But I get this error:

Here is my code up until then:

# Import modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import tree
df = pd.read_csv('AllMSAData.csv')
df.head()
corr_matrix = df.corr()
corr_matrix
df.describe()
# Get featurs and targets
labels = np.array(df['CurAV'])
# Remove the labels from the features
# axis 1 refers to the columns
df = df.drop('CurAV', axis = 1)
# Saving feature names for later use
feature_list = list(df.columns)
# Convert to numpy array
df = np.array(df)

What do you want get dummies for? You need to pass it a Series, for instance. — ALollz, Dec 01 '18 at 00:42
I need to change my categorical data into numerical to perform random forest — , Dec 01 '18 at 00:44

sacuL · Accepted Answer · 2018-12-01T00:52:26.537

3

IMO, the documentation should be updated, because it says pd.get_dummies accepts data that is array-like, and a 2-D numpy array is array like (despite the fact that there is no formal definition of array-like). However, it seems to not like multi-dimensional arrays.

Take this tiny example:

>>> df
   a  b  c
0  a  1  d
1  b  2  e
2  c  3  f

You can't get dummies on the underlying 2D numpy array:

>>> pd.get_dummies(df.values)

Exception: Data must be 1-dimensional

But you can get dummies on the dataframe itself:

>>> pd.get_dummies(df)
   b  a_a  a_b  a_c  c_d  c_e  c_f
0  1    1    0    0    1    0    0
1  2    0    1    0    0    1    0
2  3    0    0    1    0    0    1

Or on the 1D array underlying an individual column:

>>> pd.get_dummies(df['a'].values)
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1

edited Dec 01 '18 at 00:52

answered Dec 01 '18 at 00:48

sacuL

49,704
8
81
106

What would you recommend for my case then? – Dec 01 '18 at 00:52
I noticed that when I call pd.get_dummies(df) before the features and targets part I do not get an error but then it does nothing to the data – Dec 01 '18 at 00:53
use `pd.get_dummies(df[['columns', 'to', 'dummify']])` – sacuL Dec 01 '18 at 00:53
2

KeyError: "['columns' 'to' 'dummify'] not in index" – Dec 01 '18 at 00:54
That was meant as a placeholder, replace columns to dummify with the columns you want. For example, if you want to get dummies for `State` and `Prev_CS_Tier`, use `pd.get_dummies(df[['State', 'Prev_CS_Tier']])` – sacuL Dec 01 '18 at 00:57
So I have to specify each column I want to get dummies for? – Dec 01 '18 at 00:58
That would work, or you could use the `columns` argument. Take a look at the [docs](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html) for that argument, it explains it better than I can... but if you just do `pd.get_dummies(df)` *before* you transform `df` into a numpy array, it will just convert all `object` and `category` columns to dummies, which *might* be what you're looking for (but you should think about your data; for instance, I would personally convert `Prev_CS_Tier` to an ordinal, rather than a dummy) – sacuL Dec 01 '18 at 01:06
Still not working but no worries, I will figure it out – Dec 01 '18 at 01:08
I just want to change all my categorical data and nothing to the other numerical data variables but this is still doing nothing – Dec 01 '18 at 01:15
You'll need to concatenate the result into your original dataframe. `get_dummies` is not done in place, it returns its own dataframe. – sacuL Dec 01 '18 at 01:16
Okay, should I use get_dummies() before or after I convert my data into an array? – Dec 01 '18 at 01:17
Before. See earlier comments and the answer I posted – sacuL Dec 01 '18 at 01:18
Okay after I create my dummies before I put my data into a training and testing set should I drop all my categorical variables otherwise if I do not I will get errors – Dec 01 '18 at 01:34

get_dummies(), Exception: Data must be 1-dimensional

1 Answers1

Linked