0

I have a pandas data frame that looks something like this:

            EIN                                    file_num
0      10043280                          [2748, 3010, 4410]
1      10391479    [217, 829, 1753, 3131, 4376, 7428, 8048]
2      10430261                [362, 531, 3788, 4851, 5680]
3      10564355              [1165, 2117, 3498, 5101, 5666]
4      10589927  [1128, 2886, 3225, 4158, 5924, 6811, 7953]
...         ...                                         ...
1592  980634789              [5095, 5653, 5800, 6750, 8133]
1593  986001141                          [4864, 6973, 7147]
1594  990078306        [1154, 2011, 3554, 4619, 5640, 6353]
1595  990170479  [1391, 2783, 3798, 5459, 6115, 7348, 8116]
1596  990317895                    [4882, 5730, 7083, 7847]

[1597 rows x 2 columns]

As you can see, each EIN has multiple files. I want to expand the data frame so each file has its own row, something like this:

            EIN      file_num
0      10043280          2748
1      10043280          3010
2      10043280          4410

How can I accomplish this?

1 Answers1

2

You can try with explode

df = df.explode('file_num')
BENY
  • 317,841
  • 20
  • 164
  • 234
  • that doesn't work for me because for some reason (probably because I converted from df to csv back to df) the value is a string so it is `'[x,y,z]'` how can I convert it to a list so I can use the explode method? – Matthew Kaplan Aug 07 '20 at 23:22
  • 2
    import ast df['file_num']=df.file_num.apply(ast.literal_eval) – BENY Aug 07 '20 at 23:25