-1

I have a dataframe that looks like this:

amount1 amount2 period  customerId
3.0     4.0     weekly   12345
23.0    26.0    monthly  12345
201.0   190.0   yearly    12345
3.1     4.2     weekly   56789
21.2    36.5    monthly  56789
231.0   191.0   yearly    56789

I'd like to flatten it like this :

weekly_amt1 weekly_amt2 monthly_amt1 monthly_amt2 yearly_amt1 yearly_amt2 customerId
3.0         4.0         23.0         26.0         201.0       190.0       12345
3.1         4.2         21.2         36.5         231.0       191.0       56789

What would be the most efficient way to do this ? I know that there is a pivot function, but it doesn't seem to handle pivoting on more than 1 column, and since it's expensive I don't want to pivot more than once.

femibyte
  • 3,317
  • 7
  • 34
  • 59

1 Answers1

0

I've been able to find an answer:

def pivot_mult(df, cols):
    mydf = df.select('customerId').drop_duplicates()
    for c in cols:
        mydf = mydf.join(df.withColumn('combcol',\
               F.concat(F.lit('{}_'.format(c)),df['period']))\
                .groupby('customerId').pivot('combcol').\
                         agg(F.first(c)),'customerId')
    return mydf
femibyte
  • 3,317
  • 7
  • 34
  • 59