2

Given a Series object which I have pulled from a dataframe, for example through:

columns = list(df)
for col in columns:
    s = df[col] # The series object

The Series contains a <class 'list'> in each row, making it look like this:

0       [116, 66]
2       [116, 66]
4       [116, 66]
6       [116, 66]
8       [116, 66]
          ...
1498    [117, 66]
1500    [117, 66]
1502    [117, 66]
1504    [117, 66]
1506    [117, 66]

How could I split this up, so it becomes two columns in the Series instead?

0       116   66
2       116   66
          ...
1506    116   66

And then append it back to the original df?

rshah
  • 675
  • 2
  • 12
  • 32
  • 1
    Does this answer your question? [Pandas split column of lists into multiple columns](https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns) – sushanth Jul 19 '20 at 13:11
  • 2
    `pd.DataFrame(s.tolist())`? – Ch3steR Jul 19 '20 at 13:14
  • @Ch3steR that works! But how can i retain the name of the original column? It just splits the list up into two columns named `0` and `1`. – rshah Jul 19 '20 at 13:18
  • Since it's a Series it only has one name, how would you want to rename the second columns? – Ch3steR Jul 19 '20 at 13:20
  • @rshah, ``pd.DataFrame(s.to_list(), columns=['a','b'])`` – sushanth Jul 19 '20 at 13:21
  • @Sushanth It solves the problem but have to rewrite column names manually every time, would be difficult if the inner lists are longer. If OP provides more details on how go about column names may be we can come up with something dynamic ;) – Ch3steR Jul 19 '20 at 13:25
  • @Ch3steR I have posted an answer which is dynamic ;) Thanks! – rshah Jul 19 '20 at 13:28

1 Answers1

0

From Ch3steR's comment of using pd.DataFrame(s.tolist()), I managed to get the answer I was looking for, including renaming the columns in the new dataframe to also include the column name of the existing Series.

columns = list(df)
for col in columns:
    df2 = pd.DataFrame(df[col].tolist())
    df2.columns = [col+"_"+str(y) for y in range(len(df2.columns))]
    print(df2)

To keep this shorter, as also suggested by Ch3steR, we can simplify the above to:

columns = list(df)
for col in columns:
    df2 = pd.DataFrame(df[col].tolist()).add_prefix(col)
    print(df2)

Which in my case, gives the following output:

     FrameLen_0  FrameLen_1 
0           116          66
1           116          66
2           116          66
3           116          66
4           116          66
..          ...         ...
749         117          66
750         117          66
751         117          66
752         117          66
753         117          66
rshah
  • 675
  • 2
  • 12
  • 32
  • 1
    `df2.columns = [col+"_"+str(y) for y in range(len(df2.columns))]` You dont need this, use `df.add_prefix` here i.e `df2 = pd.DataFrame(df[col].tolist()).add_prefix(col)` – Ch3steR Jul 19 '20 at 13:32
  • Whats the shape of your `df`? Can you post `df.to_dict()` to the question? (If the `df` is large `df.head(10).to_dict()`) So we can come up with solution without `for-loop` – Ch3steR Jul 19 '20 at 13:37
  • In this small example my df has shape `(754, 15)` but the true dataset I have has the shape `(83292, 15)` – rshah Jul 19 '20 at 13:39
  • `df.head(10).to_dict()` post this then, would only 1st 10 rows' data. – Ch3steR Jul 19 '20 at 13:41