-2

I have a file full of data

Come up against an issue where i want to summarize results in a query.

Example being as follows:

NAME | FRUIT | PRICE 
JOHN | APPLE |  2
JOHN | APPLE |  2
JOHN | APPLE |  2
JOHN | APPLE |  2
JOHN | GRAPE |  3
DAVE | GRAPE |  3
DAVE | GRAPE |  3
DAVE | GRAPE |  3

This is my table at the moment, what i need though is to have a summary of Johns business, like below:

NAME | FRUIT      | PRICE 
JOHN | APPLE      |  2
JOHN | APPLE      |  2
JOHN | APPLE      |  2
JOHN | APPLE      |  2
JOHN | APPLETOTAL |  8 
JOHN | GRAPE      |  3
JOHN | GRAPETOTAL |  3
DAVE | GRAPE      |  3
DAVE | GRAPE      |  3
DAVE | GRAPE      |  3

Can this be done?

Nonagon
  • 397
  • 1
  • 5
  • 21
  • Possible duplicate of [Pandas group-by and sum](https://stackoverflow.com/questions/39922986/pandas-group-by-and-sum) – iacob Jun 28 '18 at 13:37

2 Answers2

1

You can try this :

  1. Create a summary dataframe

    summary_df = df.groupby(["NAME","FRUIT"] )["PRICE"].sum()

  2. Then mutate the FRUIT column like this

    summary_df.FRUIT = summary_df.FRUIT.apply(lambda x : x + "TOTAL")

  3. Finally add the dataframe back to the base df

    df = pd.concat([df, summary_df])

You can sort this dataframe df by column FRUIT to get a sorted dataframe

Nick M
  • 822
  • 1
  • 7
  • 20
1

This should sort you out:

totals = df.groupby(['FRUIT', 'NAME']).sum().reset_index()
totals['FRUIT'] = totals['FRUIT'] + 'TOTAL'

result = pd.concat([df, totals]).sort_values(['NAME', 'FRUIT']).reset_index(drop=True)
result
#    NAME       FRUIT  PRICE
#0   DAVE       GRAPE      3
#1   DAVE       GRAPE      3
#2   DAVE       GRAPE      3
#3   DAVE  GRAPETOTAL      9
#4   JOHN       APPLE      2
#5   JOHN       APPLE      2
#6   JOHN       APPLE      2
#7   JOHN       APPLE      2
#8   JOHN  APPLETOTAL      8
#9   JOHN       GRAPE      3
#10  JOHN  GRAPETOTAL      3
zipa
  • 27,316
  • 6
  • 40
  • 58