1

Given a dataframe:

FrameLen    FrameCapLen      IPHdrLen          IPLen  ...     Loss_25     Loss_50          Interval PacketTime
0      [118.0, 66.0]  [118.0, 66.0]  [20.0, 20.0]  [104.0, 52.0]  ...  [0.0, 0.0]  [0.0, 0.0]    [918.0, 918.0]   0.000031
1      [120.0, 66.0]  [120.0, 66.0]  [20.0, 20.0]  [106.0, 52.0]  ...  [0.0, 0.0]  [0.0, 0.0]  [3527.0, 3527.0]   0.000011
2      [117.0, 66.0]  [117.0, 66.0]  [20.0, 20.0]  [103.0, 52.0]  ...  [0.0, 0.0]  [0.0, 0.0]  [1256.0, 1256.0]   0.000016
3      [118.0, 66.0]  [118.0, 66.0]  [20.0, 20.0]  [104.0, 52.0]  ...  [0.0, 0.0]  [0.0, 0.0]    [652.0, 652.0]   0.000017
4      [119.0, 66.0]  [119.0, 66.0]  [20.0, 20.0]  [105.0, 52.0]  ...  [0.0, 0.0]  [0.0, 0.0]      [44.0, 44.0]   0.000032
...              ...            ...           ...            ...  ...         ...         ...               ...        ...
83287  [117.0, 66.0]  [117.0, 66.0]  [20.0, 20.0]  [103.0, 52.0]  ...  [0.0, 0.0]  [0.0, 0.0]    [472.0, 472.0]   0.000024

All the columns containing a list have the following types:

<class 'pandas.core.series.Series'>
0        [118.0, 66.0]
1        [120.0, 66.0]
2        [117.0, 66.0]
3        [118.0, 66.0]
4        [119.0, 66.0]
             ...
83287    [117.0, 66.0]
83288    [120.0, 66.0]
83289    [117.0, 66.0]
83290    [116.0, 66.0]
83291    [122.0, 66.0]

How can I expand these series for each column containing a Series, such that the result is:

FrameLen_1   FrameLen_2   FrameCapLen_1, ..., ...
118.0        66.0         118.0

It would be great if this can be done, under the assumption that one may not know how many columns contain a Series.

rshah
  • 675
  • 2
  • 12
  • 32
  • This should help: https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns – TYZ Jul 16 '20 at 15:07
  • Does this answer your question? [Pandas split column of lists into multiple columns](https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns) – bigbounty Jul 16 '20 at 15:09
  • That is quite manual, and I have a large number of columns in this format. I was looking for a more autonomous way without specifying the columns, as not everyone will know what columns may have this. – rshah Jul 16 '20 at 15:10

1 Answers1

2

stack/str/concat

stacked = df.stack().str
pd.concat([stacked[0], stacked[1]], axis=1) \
  .unstack().swaplevel(1, 0, 1).sort_index(axis=1)

  FrameLen     IPLen    
         0   1     0   1
0      118  66   104  52
1      120  66   106  52

You can pass a dict instead of a list

stacked = df.stack().str
pd.concat({'1': stacked[0], '2': stacked[1]}, axis=1) \
  .unstack().swaplevel(1, 0, 1).sort_index(axis=1)

  FrameLen     IPLen    
         1   2     1   2
0      118  66   104  52
1      120  66   106  52

Comprehension

dat = [
    {f'{c}_{i}': x
     for c, X in zip(df, tup)
     for i, x in enumerate(X)}
    for tup in zip(*map(df.get, df))
]

pd.DataFrame(dat)

   FrameLen_0  FrameLen_1  IPLen_0  IPLen_1
0         118          66      104       52
1         120          66      106       52

You can pass an initial value to enumerate

dat = [
    {f'{c}_{i}': x
     for c, X in zip(df, tup)
     for i, x in enumerate(X, 1)}
    for tup in zip(*map(df.get, df))
]

pd.DataFrame(dat)

   FrameLen_1  FrameLen_2  IPLen_1  IPLen_2
0         118          66      104       52
1         120          66      106       52

Setup

df = pd.DataFrame({
    'FrameLen': [[118, 66], [120, 66]],
    'IPLen': [[104, 52], [106, 52]]
})
piRSquared
  • 285,575
  • 57
  • 475
  • 624