1

reproducible code for data:

import pandas as pd
dict = {"a": "[1,2,3,4]", "b": "[1,2,3,4]"}
dict = pd.DataFrame(list(dict.items()))

dict

    0   1
 0  a   [1,2,3,4]
 1  b   [1,2,3,4]

I wanted to split/delimit "column 1" and create individual rows for each split values.

expected output:

     0    1
  0  a    1
  1  a    2
  2  a    3
  3  a    4
  4  b    1
  5  b    2
  6  b    3
  7  b    4

Should I be removing the brackets first and then split the values? I really don't get any idea of doing this. Any reference that would help me solve this please?

vishnu prashanth
  • 409
  • 11
  • 21
  • Possible duplicate of https://stackoverflow.com/questions/39011511/pandas-expand-rows-from-list-data-available-in-column – Rahul Chawla Jun 14 '18 at 13:21
  • Possible duplicate of [Pandas expand rows from list data available in column](https://stackoverflow.com/questions/39011511/pandas-expand-rows-from-list-data-available-in-column) – Rahul Chawla Jun 14 '18 at 13:22

2 Answers2

2

Based on the logic from that answer:

s = d[1]\
    .apply(lambda x: pd.Series(eval(x)))\
    .stack()

s.index = s.index.droplevel(-1)
s.name = "split"
d.join(s).drop(1, axis=1)
koPytok
  • 3,453
  • 1
  • 14
  • 29
  • Thanks Kopytok, This solution also worked. I felt the other one was easier to understand and interpret. Thanks again for the solution :) – vishnu prashanth Jun 14 '18 at 13:30
1

Because you have strings containing a list (and not lists) in your cells, you can use eval:

dict_v = {"a": "[1,2,3,4]", "b": "[1,2,3,4]"}
df = pd.DataFrame(list(dict_v.items()))
df = (df.rename(columns={0:'l'}).set_index('l')[1]
          .apply(lambda x: pd.Series(eval(x))).stack()
           .reset_index().drop('level_1',1).rename(columns={'l':0,0:1}))

or another way could be to create a DataFrame (probably faster) such as:

df = (pd.DataFrame(df[1].apply(eval).tolist(),index=df[0])
          .stack().reset_index(level=1, drop=True)
            .reset_index(name='1'))

your output is

   0  1
0  a  1
1  a  2
2  a  3
3  a  4
4  b  1
5  b  2
6  b  3
7  b  4

all the rename are to get exactly your input/output

Ben.T
  • 29,160
  • 6
  • 32
  • 54