0

I am trying to split a column containing a list of values into multiple columns after groupby. I want to do this on the fly by adding columns based on index position rather than creating a new separate Dataframe.

The below set of code generates data and creates a groupby object and a column list using the custom function fnpercentiles

import pandas as pd

#DATA
df = pd.DataFrame(data=
    {'ticker': ['AAPL','AAPL','AAPL','IBM','IBM','IBM'],
       'side': ['B','B','S','S','S','B'],
       'size': [100, 200, 300, 400, 100, 200],
      'price': [10.12, 10.13, 10.14, 20.3, 20.2, 20.1]})

#FUNCTION
def fnpercentiles(a):
    return [np.percentile(a, 0.25), np.percentile(a, 0.75)]

g = df.groupby(['ticker', 'side'])

#OPERATION
g12=pd.DataFrame()
g12['price/mean'] = g['size'].mean()/g['price'].sum()
g12['fn-cust'] = g['price'].agg([fnpercentiles])

I can generate a new Dataframe by splitting columns (see code below)

h12 = pd.DataFrame(g12['fn-cust'].tolist())

But I want to insert individual columns from the list directly into the Dataframe that's already being generated. I tried the below code and a few variants to no avail

#doesn't work
g12['list_col1'] = g['price'].agg([fnpercentiles]).tolist()[0]

A workaround would be to first split the list into a new Dataframe and insert other columns later. But is there a way to achieve I want without this hack?

user13874
  • 367
  • 2
  • 10
  • Without any further context (that might've been helpful I should add) you can extract specific column using `g['price'].agg([fnpercentiles]).str[i]` where i is the ith column. – cs95 Jun 29 '19 at 15:36
  • see https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns – Ouyang Ze Jun 29 '19 at 17:42

1 Answers1

2

You could do it using list comprehensions :

g12['list_col1'] = [x[0] for x in g12['fn-cust'].tolist()]
g12['list_col2'] = [x[1] for x in g12['fn-cust'].tolist()] 

You can even do it in one line using the zip function :

g12['list_col1'], g12['list_col2'] = zip(*g12['fn-cust'].tolist())

Output :

             price/mean                 fn-cust  list_col1  list_col2
ticker side                                                          
AAPL   B       7.407407  [10.120025, 10.120075]  10.120025  10.120075
       S      29.585799          [10.14, 10.14]  10.140000  10.140000
IBM    B       9.950249            [20.1, 20.1]  20.100000  20.100000
       S       6.172840    [20.20025, 20.20075]  20.200250  20.200750
vlemaistre
  • 3,301
  • 13
  • 30