1

i have a sample dataset

id           value
[10,10]     ["apple","orange"]  
[15,67]      ["banana","orange"] 
[12,34,45]   ["apple","banana","orange"] 

i want to convert this into

id1 id2 id3            value1 value2 value3
10  10  nan           apple  orange   nan
15  67  nan           banana orange   nan
10  10  45            apple  banana  orange
  • i solved this problem earlier using if else conditions
  • but data could be dynamic so it may have more then 3 values.
  • How to split into multiple column with renaming it as mentioned
Erfan
  • 40,971
  • 8
  • 66
  • 78
raju
  • 119
  • 9

2 Answers2

3

We can reconstruct your data with tolist and pd.DataFrame. Then concat everything together again:

d = [pd.DataFrame(df[col].tolist()).add_prefix(col) for col in df.columns]
df = pd.concat(d, axis=1)

   id0  id1   id2  value0  value1  value2
0   10   10   NaN   apple  orange    None
1   15   67   NaN  banana  orange    None
2   12   34  45.0   apple  banana  orange
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • can you explain me your code df[col].tolist()).add_prefix(col) – raju Sep 23 '20 at 23:14
  • `tolist` converts your pandas Series (column) to a Python list: `[[10, 10], [15, 67], [12, 34, 45]]`. `add_prefix` Adds a prefix for your column names, because those are now `0, 1, 2 .. n`. Try running each part and you will see it;s pretty straightforward – Erfan Sep 23 '20 at 23:16
1

Try this code.

df = pd.DataFrame({"id":[[10, 10], [15, 67], [12, 34, 45]],
                   "value":[['a', 'o'], ['b', 'o'], ['a', 'b', 'o']]})
    
output = pd.DataFrame()
for col in df.columns:
    output = pd.concat([output,
                       pd.DataFrame(df[col].tolist(), columns = [col + str(i+1) for i in range(df[col].apply(len).max())])],
                        axis = 1)

Key code is pd.DataFrame(df[col].tolist(), columns = [col + str(i+1) for i in range(df[col].apply(len).max())])].

Here,df[col].apply(len).max() is maximum number of elements among lists in a column. df[col].tolist() converts df[col] into nested list, and remake it as DataFrame.

Gilseung Ahn
  • 2,598
  • 1
  • 4
  • 11