Python:how to split column into multiple columns in a dataframe and with dynamic column naming

Question

i have a sample dataset

id           value
[10,10]     ["apple","orange"]  
[15,67]      ["banana","orange"] 
[12,34,45]   ["apple","banana","orange"]

i want to convert this into

id1 id2 id3            value1 value2 value3
10  10  nan           apple  orange   nan
15  67  nan           banana orange   nan
10  10  45            apple  banana  orange

i solved this problem earlier using if else conditions
but data could be dynamic so it may have more then 3 values.
How to split into multiple column with renaming it as mentioned

score 3 · Accepted Answer · answered Sep 23 '20 at 23:02

3

We can reconstruct your data with tolist and pd.DataFrame. Then concat everything together again:

d = [pd.DataFrame(df[col].tolist()).add_prefix(col) for col in df.columns]
df = pd.concat(d, axis=1)

   id0  id1   id2  value0  value1  value2
0   10   10   NaN   apple  orange    None
1   15   67   NaN  banana  orange    None
2   12   34  45.0   apple  banana  orange

answered Sep 23 '20 at 23:02

Erfan

40,971
8
66
78

can you explain me your code df[col].tolist()).add_prefix(col) – raju Sep 23 '20 at 23:14
`tolist` converts your pandas Series (column) to a Python list: `[[10, 10], [15, 67], [12, 34, 45]]`. `add_prefix` Adds a prefix for your column names, because those are now `0, 1, 2 .. n`. Try running each part and you will see it;s pretty straightforward – Erfan Sep 23 '20 at 23:16

score 1 · Answer 2 · answered Sep 23 '20 at 22:55

Try this code.

df = pd.DataFrame({"id":[[10, 10], [15, 67], [12, 34, 45]],
                   "value":[['a', 'o'], ['b', 'o'], ['a', 'b', 'o']]})
    
output = pd.DataFrame()
for col in df.columns:
    output = pd.concat([output,
                       pd.DataFrame(df[col].tolist(), columns = [col + str(i+1) for i in range(df[col].apply(len).max())])],
                        axis = 1)

Key code is pd.DataFrame(df[col].tolist(), columns = [col + str(i+1) for i in range(df[col].apply(len).max())])].

Here,df[col].apply(len).max() is maximum number of elements among lists in a column. df[col].tolist() converts df[col] into nested list, and remake it as DataFrame.

Python:how to split column into multiple columns in a dataframe and with dynamic column naming

2 Answers2

Linked