Pandas: split column of lists of unequal length into multiple columns

Question

I have a Pandas dataframe that looks like the below:

                   codes
1                  [71020]
2                  [77085]
3                  [36415]
4                  [99213, 99287]
5                  [99233, 99233, 99233]

I'm trying to split the lists in df['codes'] into columns, like the below:

                   code_1      code_2      code_3   
1                  71020
2                  77085
3                  36415
4                  99213       99287
5                  99233       99233       99233

where columns that don't have a value (because the list was not that long) are filled with blanks or NaNs or something.

I've seen answers like this one and others similar to it, and while they work on lists of equal length, they all throw errors when I try to use the methods on lists of unequal length. Is there a good way do to this?

piRSquared · Accepted Answer · 2017-06-20T22:33:43.430

37

Try:

pd.DataFrame(df.codes.values.tolist()).add_prefix('code_')

   code_0   code_1   code_2
0   71020      NaN      NaN
1   77085      NaN      NaN
2   36415      NaN      NaN
3   99213  99287.0      NaN
4   99233  99233.0  99233.0

Include the index

pd.DataFrame(df.codes.values.tolist(), df.index).add_prefix('code_')

   code_0   code_1   code_2
1   71020      NaN      NaN
2   77085      NaN      NaN
3   36415      NaN      NaN
4   99213  99287.0      NaN
5   99233  99233.0  99233.0

We can nail down all the formatting with this:

f = lambda x: 'code_{}'.format(x + 1)
pd.DataFrame(
    df.codes.values.tolist(),
    df.index, dtype=object
).fillna('').rename(columns=f)

   code_1 code_2 code_3
1   71020              
2   77085              
3   36415              
4   99213  99287       
5   99233  99233  99233

edited Jun 20 '17 at 22:33

answered Jun 20 '17 at 22:27

piRSquared

285,575
57
475
624

If I have a column to keep. Is there any way to do it rather than merging back after spliting? – steven Aug 24 '20 at 22:05
This works well as long as the first rows in the selected columns are not empty. If they are empty then the list will not effectively split. A work around is to dropna, keep the indexes as shown above and then rejoin on the initial dataframe. – pendletonian Mar 01 '22 at 18:03
@pendletonian - do you have a solution showing the use of `dropna`? I'm facing this problem at the moment. – Andy Clifton Nov 10 '22 at 23:18

score 20 · Answer 2 · answered Jun 20 '17 at 22:29

20

Another solution:

In [95]: df.codes.apply(pd.Series).add_prefix('code_')
Out[95]:
    code_0   code_1   code_2
1  71020.0      NaN      NaN
2  77085.0      NaN      NaN
3  36415.0      NaN      NaN
4  99213.0  99287.0      NaN
5  99233.0  99233.0  99233.0

answered Jun 20 '17 at 22:29

MaxU - stand with Ukraine

205,989
36
386
419

1

This solution works best for me as it's able to handle a series that occasionally has empty lists – David Jacques Jan 19 '21 at 10:54

score 0 · Answer 3 · answered Mar 08 '22 at 10:17

I have created a function using @piRSquared solution that unrolls a dataframe from 3d (row x column x list of values) to 2d (row x column_{n})

def unroll_dataframe_columns_of_lists_to_columns(df):
    new_df = pd.DataFrame()
    for col in df.columns:
        new_df = pd.concat([new_df, pd.DataFrame(df[col].values.tolist()).add_prefix(col + '_')], axis=1)
    new_df.index = df.index
    return new_df

Pandas: split column of lists of unequal length into multiple columns

3 Answers3

Linked