29

I have a Pandas dataframe that looks like the below:

                   codes
1                  [71020]
2                  [77085]
3                  [36415]
4                  [99213, 99287]
5                  [99233, 99233, 99233]

I'm trying to split the lists in df['codes'] into columns, like the below:

                   code_1      code_2      code_3   
1                  71020
2                  77085
3                  36415
4                  99213       99287
5                  99233       99233       99233

where columns that don't have a value (because the list was not that long) are filled with blanks or NaNs or something.

I've seen answers like this one and others similar to it, and while they work on lists of equal length, they all throw errors when I try to use the methods on lists of unequal length. Is there a good way do to this?

user139014
  • 1,445
  • 2
  • 19
  • 33

3 Answers3

37

Try:

pd.DataFrame(df.codes.values.tolist()).add_prefix('code_')

   code_0   code_1   code_2
0   71020      NaN      NaN
1   77085      NaN      NaN
2   36415      NaN      NaN
3   99213  99287.0      NaN
4   99233  99233.0  99233.0

Include the index

pd.DataFrame(df.codes.values.tolist(), df.index).add_prefix('code_')

   code_0   code_1   code_2
1   71020      NaN      NaN
2   77085      NaN      NaN
3   36415      NaN      NaN
4   99213  99287.0      NaN
5   99233  99233.0  99233.0

We can nail down all the formatting with this:

f = lambda x: 'code_{}'.format(x + 1)
pd.DataFrame(
    df.codes.values.tolist(),
    df.index, dtype=object
).fillna('').rename(columns=f)

   code_1 code_2 code_3
1   71020              
2   77085              
3   36415              
4   99213  99287       
5   99233  99233  99233
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • If I have a column to keep. Is there any way to do it rather than merging back after spliting? – steven Aug 24 '20 at 22:05
  • This works well as long as the first rows in the selected columns are not empty. If they are empty then the list will not effectively split. A work around is to dropna, keep the indexes as shown above and then rejoin on the initial dataframe. – pendletonian Mar 01 '22 at 18:03
  • @pendletonian - do you have a solution showing the use of `dropna`? I'm facing this problem at the moment. – Andy Clifton Nov 10 '22 at 23:18
20

Another solution:

In [95]: df.codes.apply(pd.Series).add_prefix('code_')
Out[95]:
    code_0   code_1   code_2
1  71020.0      NaN      NaN
2  77085.0      NaN      NaN
3  36415.0      NaN      NaN
4  99213.0  99287.0      NaN
5  99233.0  99233.0  99233.0
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0

I have created a function using @piRSquared solution that unrolls a dataframe from 3d (row x column x list of values) to 2d (row x column_{n})

def unroll_dataframe_columns_of_lists_to_columns(df):
    new_df = pd.DataFrame()
    for col in df.columns:
        new_df = pd.concat([new_df, pd.DataFrame(df[col].values.tolist()).add_prefix(col + '_')], axis=1)
    new_df.index = df.index
    return new_df
Youstanzr
  • 605
  • 1
  • 8
  • 16