Pandas equivalent of pyspark groupby, pivot, sum

Asked Oct 09 '22 at 23:54

Active Oct 10 '22 at 00:01

Viewed 54 times

I am trying to create an equivalent for pyspark groupby pivot and sum in this stackoverflow question LINK

I tired applying pivot after groupby on pandas dataframe but is not giving results as expected. Sample data:

    import pandas as pd
    data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Score':[28,34,29,42],'Status': [1,0,1,0,1}
    df = pd.DataFrame(data)
    df.groupby(['Name').pivot(index='Status')['Score'].sum()

asked Oct 09 '22 at 23:54

Victor Johnson

See [DataFrame.pivot_table](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html) _e.g._ `new_df = df.pivot_table(index='Name', columns='Status', values='Score', aggfunc='sum', fill_value=0)` – Henry Ecker Oct 10 '22 at 00:02
@HenryEcker, what if one value in 'Name' column is blank? – Victor Johnson Oct 10 '22 at 00:08
You'll need to figure out what you want to do with the missing values. They'll be excluded because they can't be grouped. If you're looking to do something with them you'll need to replace the missing values with some value _e.g._ "other" or "missing" etc. – Henry Ecker Oct 10 '22 at 00:15

Pandas equivalent of pyspark groupby, pivot, sum

0 Answers0