How can I expand a numpy arrays on a df to their own columns?

Question

I have a weird problem, I have the following dataframe:

embedding
0   [0.0, 0.0, 0.0, 0.6223578453063965, 0.0, 0.270...
1   [0.0, 0.0, 0.0, 0.6223578453063965, 0.0, 0.270...
2   [0.0, 0.0, 0.0, 0.6223578453063965, 0.0, 0.270..

It's a dataframe with one columned named embedding. It's about 100 item array for each row. They are all the same size for each row.

How can I expand it so each item in the array its own column in a dataframe? Is it possible? or do I have to extract the numpy array and create a dataframe from the nested array?

Update: I don't have names for all columns. It's not important to me. What is important is that the order be preserved from the numpy array.

Update2: as per comment -

print(Xtest_e1.head(2).to_dict())
{'embedding': {0: array([0.        , 0.        , 0.        , 0.62235785, 0.        ,
       0.27049118, 0.        , 0.31094068, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.4330532 , 0.        ,
       0.        , 0.25157961, 0.        , 0.        , 0.        ,
       0.40683705, 0.01569915, 0.        , 0.        , 0.        ,
       0.13090582, 0.        , 0.49955425, 0.06970194, 0.29155406,
       0.        , 0.        , 0.27342197, 0.        , 0.        ,
       0.        , 0.04415211, 0.        , 0.03908829, 0.        ,
       0.07673171, 0.33199945, 0.        , 0.51759815, 0.        ,
       0.47191489, 0.45380819, 0.13475986, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.08000553,
       0.        , 0.02991109, 0.        , 0.50515431, 0.        ,
       0.24663273, 0.        , 0.50839704, 0.        , 0.        ,
       0.05281948, 0.44884402, 0.        , 0.44542992, 0.15376966,
       0.        , 0.        , 0.        , 0.39128256, 0.49497205,
       0.        , 0.        ]), 1: array([0.        , 0.        , 0.        , 0.62235785, 0.        ,
       0.27049118, 0.        , 0.31094068, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.4330532 , 0.        ,
       0.        , 0.25157961, 0.        , 0.        , 0.        ,
       0.40683705, 0.01569915, 0.        , 0.        , 0.        ,
       0.13090582, 0.        , 0.49955425, 0.06970194, 0.29155406,
       0.        , 0.        , 0.27342197, 0.        , 0.        ,
       0.        , 0.04415211, 0.        , 0.03908829, 0.        ,
       0.07673171, 0.33199945, 0.        , 0.51759815, 0.        ,
       0.47191489, 0.45380819, 0.13475986, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.08000553,
       0.        , 0.02991109, 0.        , 0.50515431, 0.        ,
       0.24663273, 0.        , 0.50839704, 0.        , 0.        ,
       0.05281948, 0.44884402, 0.        , 0.44542992, 0.15376966,
       0.        , 0.        , 0.        , 0.39128256, 0.49497205,
       0.        , 0.        ])}}

Duplicate of https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns ? — Nick ODell, Jun 09 '21 at 21:47
@NickODell but that solution requires me to know the names of the columns in advance? I do not care about the columns name but I don't want to name each of the column names..they can be column1, column2, etc..As only the order matters. — Lostsoul, Jun 09 '21 at 21:48
If you need to generate names for the columns, you could use a list expression like `['column%d' % i for i in range(100)]`. — Nick ODell, Jun 09 '21 at 21:52
can you add in your dataframe as a dict? just the first 1-2 rows `print(df.head(2).to_dict())` — Umar.H, Jun 09 '21 at 21:54
awesome, the output is still a little unclear to me, but can you try `s = df.stack().explode().reset_index(1)`;`s['level_1'] = s['level_1'] + s.groupby(level=0).cumcount().astype(str)`;`s.set_index('level_1',append=True).unstack(1)`? — Umar.H, Jun 09 '21 at 22:10
@Umar.H I think that did it. I'm testing now. The outcome looks like what I wanted so far. — Lostsoul, Jun 09 '21 at 22:18

Corralien · Accepted Answer · 2021-06-09T22:34:54.943

2

Is it what you expect:

>>> pd.DataFrame(Xtest_e1["embedding"].tolist()).add_prefix("c")

    c0   c1   c2        c3   c4  ...  c72       c73       c74  c75  c76
0  0.0  0.0  0.0  0.622358  0.0  ...  0.0  0.391283  0.494972  0.0  0.0
1  0.0  0.0  0.0  0.622358  0.0  ...  0.0  0.391283  0.494972  0.0  0.0

[2 rows x 77 columns]

edited Jun 09 '21 at 22:34

answered Jun 09 '21 at 22:24

Corralien

109,409
8
28
52

I get a weird error - TypeError: 'NoneType' object is not iterable – Lostsoul Jun 09 '21 at 22:36
This is the shape and it's what I posted above. (43206, 1). The column type is 'object' does that make a difference - Xtest_e1.columns results in Index(['embedding'], dtype='object') – Lostsoul Jun 09 '21 at 22:42
pd.DataFrame(Xtest_e1["embedding"].to_dict()).add_prefix("c") – Lostsoul Jun 10 '21 at 00:32
Are you sure `embedding` column is a list? This snippet of code seems to work with your sample. – Corralien Jun 10 '21 at 04:35

How can I expand a numpy arrays on a df to their own columns?

1 Answers1