0

I am trying to create an equivalent for pyspark groupby pivot and sum in this stackoverflow question LINK

I tired applying pivot after groupby on pandas dataframe but is not giving results as expected. Sample data:

    import pandas as pd
    data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Score':[28,34,29,42],'Status': [1,0,1,0,1}
    df = pd.DataFrame(data)
    df.groupby(['Name').pivot(index='Status')['Score'].sum()
  • See [DataFrame.pivot_table](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html) _e.g._ `new_df = df.pivot_table(index='Name', columns='Status', values='Score', aggfunc='sum', fill_value=0)` – Henry Ecker Oct 10 '22 at 00:02
  • @HenryEcker, what if one value in 'Name' column is blank? – Victor Johnson Oct 10 '22 at 00:08
  • You'll need to figure out what you want to do with the missing values. They'll be excluded because they can't be grouped. If you're looking to do something with them you'll need to replace the missing values with some value _e.g._ "other" or "missing" etc. – Henry Ecker Oct 10 '22 at 00:15

0 Answers0