Pandas : Add arrays as values of column

Question

I have a Pandas DataFrame df that stores some numeric values :

print(df)

       value 
0          0
1          2
2          4
3          5
4          8

And I have a function that converts a numerical value to a one-hot vector

print(to_categorical(0))
[1 0 0 0 0 0 0 0 0 0]

print(to_categorical(5))
[0 0 0 0 0 5 0 0 0 0]

etc...

So, I can call my function over my columns of numeric value :

print(to_categorical(df['value'))

[[1 0 0 0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 0 1 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 0]]

And now I want to store my results as a new column. Here is what I expect from my example :

df['one-hot'] = to_categorical(df['value')
print(df)

        value                    one-hot
0          0       [1 0 0 0 0 0 0 0 0 0]
1          2       [0 0 1 0 0 0 0 0 0 0]
2          4       [0 0 0 0 1 0 0 0 0 0]
3          5       [0 0 0 0 0 1 0 0 0 0]
4          8       [0 0 0 0 0 0 0 0 1 0]

But this give me an error since pandas tries to flatten my array into multiple colums. How can I do that ?

Possible duplicate of [How do I get a DataFrame Index / Series column as an array or list?](https://stackoverflow.com/questions/17241004/how-do-i-get-a-dataframe-index-series-column-as-an-array-or-list) — Georgy, Mar 28 '19 at 11:08

score 3 · Accepted Answer · answered Mar 28 '19 at 09:50

3

First I think working with lists in pandas is not good idea, but is is possible by convert to lists:

df['one-hot'] = to_categorical(df['value').tolist()

answered Mar 28 '19 at 09:50

jezrael

822,522
95
1,334
1,252

In my case, I want to have a structure that stores a mapping between my values (few thousands of unique values), and the corresponding one-hot vectors (so vectors of few thousands of values) What do you think would be a better approach ? – Nakeuh Mar 28 '19 at 09:54
1

@Nakeuh - Better is create new DataFrame - `df1 = pd.DataFrame(to_categorical(df['value'), index=df.index)` – jezrael Mar 28 '19 at 09:54
1

Hm I see. I suppose that the 'all in one DataFrame' is less computing efficient ? I think I will stay with the non-efficient way for now (I prefer having only one object and I am not really concerned about efficiency in my use case), but I keep in mind your suggestion for the future. Thanks ! – Nakeuh Mar 28 '19 at 10:02

Pandas : Add arrays as values of column

1 Answers1