4

I'm trying to one-hot encode one column of a dataframe.

enc = OneHotEncoder()
minitable = enc.fit_transform(df["ids"])

But I'm getting

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19.

Is there a workaround for this?

Bob
  • 849
  • 5
  • 14
  • 26

1 Answers1

5

I think you can use get_dummies:

df = pd.DataFrame({'ids':['a','b','c']})

print (df)
  ids
0   a
1   b
2   c

print (df.ids.str.get_dummies())
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1

EDIT:

If input is column with lists, first cast to str, remove [] by strip and call get_dummies:

df = pd.DataFrame({'ids':[[0,4,5],[4,7,8],[5,1,2]]})

print(df)
         ids
0  [0, 4, 5]
1  [4, 7, 8]
2  [5, 1, 2]

print (df.ids.astype(str).str.strip('[]').str.get_dummies(', '))
   0  1  2  4  5  7  8
0  1  0  0  1  1  0  0
1  0  0  0  1  0  1  1
2  0  1  1  0  1  0  0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I would need to convert to string. Can it be done with integers and no casting? – Bob Dec 16 '16 at 13:42
  • Is problem use `print (df.ids.str.get_dummies().astype(str))` ? – jezrael Dec 16 '16 at 13:45
  • the code above won't work when ids is a list of integers – Bob Dec 16 '16 at 13:47
  • Though this is all over StackOverflow, `get_dummies` isn't the best practice to perform encoding, as it doesn't keep the memory of the encoding itself on unseen data. Different data may be encoded in the same way, which defies the purpose of encoding in the first place. – gented Jan 13 '20 at 16:34