2

I have a pandas dataframe which has only one column, the value of each cell in the column is a list/array of numbers, this list is of length 100 and this length is consistent across all the cell values.

We need to convert each list value as a column value, in other words have a dataframe which has 100 columns and each column value is at a list/array item.

Something like this enter image description here

becomes enter image description here

It can be done with iterrows() as shown below, but we have around 1.5 million rows and need a scalable solution as iterrows() would take alot of time.

cols = [f'col_{i}' for i in range(0, 4)]
df_inter = pd.DataFrame(columns = cols)
for index, row in df.iterrows():
    df_inter.loc[len(df_inter)] = row['message']
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58

2 Answers2

2

You can do this:

In [28]: df = pd.DataFrame({'message':[[1,2,3,4,5], [3,4,5,6,7]]})

In [29]: df
Out[29]: 
           message
0  [1, 2, 3, 4, 5]
1  [3, 4, 5, 6, 7]

In [30]: res = pd.DataFrame(df.message.tolist(), index= df.index)

In [31]: res
Out[31]: 
   0  1  2  3  4
0  1  2  3  4  5
1  3  4  5  6  7
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
0

I think this would work:

df.message.apply(pd.Series)

To use dask to scale (assuming it is installed):

import dask.dataframe as dd

ddf = dd.from_pandas(df, npartitions=8)
ddf.message.apply(pd.Series, meta={0: 'object'})
Brian Larsen
  • 612
  • 8
  • 9
  • 1
    This is quite slow when it comes to large data frames. – Mayank Porwal Oct 08 '20 at 14:22
  • In that case you might consider using something like dask to help scale. – Brian Larsen Oct 08 '20 at 14:23
  • 1
    `df.apply(pd.Series)` is really slow. My solution performs better than that. You don't need Dask just to do this. – Mayank Porwal Oct 08 '20 at 14:25
  • No need for a big data library......just avoid using `.apply`. [Here](https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code) is more about problems with `.apply` that should be considered before using it. – edesz Oct 13 '20 at 12:42