2

I would like to get 3 cities with the biggest "raised" value for each state based on the dataframe below.

So, there is a number of campaigns for each city. I want to sum up "raised" amounts for each of the cities, to get a total raised amount for each city, and then to display three top cities cities in terms of the "raised" amount for each state.

     category                  city            created     goal                                               name   raised state                                                url
0     Medical                OXNARD  December 30, 2018  15000.0                           Wayne's Cancer Care Fund     80.0    CA  https://www.gofundme.com/8qz8h6-waynes-cancer-...
1     Medical           CHINO HILLS   December 4, 2018   2000.0            Mother of two has cancer and needs help    500.0    CA                   https://www.gofundme.com/3qi0rog
2     Medical          BATTLE CREEK   December 6, 2018  10000.0                                     Hospital costs    570.0    MI                   https://www.gofundme.com/3sbwals
3     Medical  FEASTERVILLE TREVOSE   December 3, 2018  10000.0                          Help raise Joey & Brianna   2200.0    MI         https://www.gofundme.com/get-away-from-him
4     Medical       WEST PALM BEACH  December 12, 2018   6000.0                        Kelvin McCray Recovery Fund   2450.0    MI      https://www.gofundme.com/send-ricky-to-school
5     Medical                 JONES  December 11, 2018  25000.0           Wheelchair Accessible Vehicle for Taelor   2270.0    OK       https://www.gofundme.com/HelpTaelorTransport
6     Medical                CONROE  December 20, 2018  10000.0                       "A Good friend in dire need"   1250.0    OK                   https://www.gofundme.com/4dmeoko

Sample date in json:

[{
    "category": "Medical",
    "city": "OXNARD",
    "created": "December 30, 2018",
    "goal": 15000.0,
    "name": "Wayne's Cancer Care Fund",
    "raised": 80.0,
    "state": "CA",
    "url": "https://www.gofundme.com/8qz8h6-waynes-cancer-care-fund"
},
{
    "category": "Medical",
    "city": "CHINO HILLS",
    "created": "December 4, 2018",
    "goal": 2000.0,
    "name": "Mother of two has cancer and needs help",
    "raised": 500.0,
    "state": "CA",
    "url": "https://www.gofundme.com/3qi0rog"
},

{
    "category": "Medical",
    "city": "BATTLE CREEK",
    "created": "December 6, 2018",
    "goal": 10000.0,
    "name": "Hospital costs",
    "raised": 570.0,
    "state": "MI",
    "url": "https://www.gofundme.com/3sbwals"
},
{
    "category": "Medical",
    "city": "FEASTERVILLE TREVOSE",
    "created": "December 3, 2018",
    "goal": 10000.0,
    "name": "Help raise Joey & Brianna",
    "raised": 2200.0,
    "state": "MI",
    "url": "https://www.gofundme.com/get-away-from-him"
},
{
    "category": "Medical",
    "city": "WEST PALM BEACH",
    "created": "December 12, 2018",
    "goal": 6000.0,
    "name": "Kelvin McCray Recovery Fund",
    "raised": 2450.0,
    "state": "MI",
    "url": "https://www.gofundme.com/send-ricky-to-school"
},
{
    "category": "Medical",
    "city": "JONES",
    "created": "December 11, 2018",
    "goal": 25000.0,
    "name": "Wheelchair Accessible Vehicle for Taelor",
    "raised": 2270.0,
    "state": "OK",
    "url": "https://www.gofundme.com/HelpTaelorTransport"
},
{
    "category": "Medical",
    "city": "CONROE",
    "created": "December 20, 2018",
    "goal": 10000.0,
    "name": "\"A Good friend in dire need\"",
    "raised": 1250.0,
    "state": "OK",
    "url": "https://www.gofundme.com/4dmeoko"
}]

The expected outcome should look like this:

123 State1 City1 100 
3          City2 99  
58         City3 98  
8   State2 City4 97  
12         City5 96  
1          City6 95  

this doesn't really help:

maxRaisedCityByState = a.df.groupby(['state','city'])['raised'].max()

The answers from the allegedly duplicate question do not help:

Ostap Didenko
  • 446
  • 2
  • 6
  • 19
  • gives me the following result: "TRENTON" – Ostap Didenko May 10 '19 at 11:30
  • Voting to reopen because the duplicate does not fully explain why or how the dataframe needs to be sorted with `df = df.sort('raised',ascending=False)` before using `df.groupby('state').head(2).reset_index(drop=True)` It only mentions possibly needing to "order/sort" without explaining how. – Ruzihm May 15 '19 at 05:57
  • @double-beep, I replaced the pic with the code and added the data sample – Ostap Didenko May 15 '19 at 16:28

1 Answers1

1

I've simplified the city and state names to make it easier to follow.

Allow me to experiment with a new solution for table formatting :)

<table><tbody><tr><th>category</th><th>city</th><th>created</th><th>goal</th><th>name</th><th>raised</th><th>state</th><th>url</th><th> </th></tr><tr><td>0</td><td>Medical</td><td>City1</td><td>December 30, 2018</td><td>15000.0</td><td>Wayne's Cancer Care Fund</td><td>80.0</td><td>State1</td><td>https://www.gofundme.com/8qz8h6-waynes-cancer-...</td></tr><tr><td>1</td><td>Medical</td><td>City1</td><td>December 4, 2018</td><td>2000.0</td><td>Mother of two has cancer and needs help</td><td>500.0</td><td>State1</td><td>https://www.gofundme.com/3qi0rog</td></tr><tr><td>2</td><td>Medical</td><td>City2</td><td>December 6, 2018</td><td>10000.0</td><td>Hospital costs</td><td>570.0</td><td>State1</td><td>https://www.gofundme.com/3sbwals</td></tr><tr><td>3</td><td>Medical</td><td>City3</td><td>December 3, 2018</td><td>10000.0</td><td>Help raise Joey &amp; Brianna</td><td>2200.0</td><td>State1</td><td>https://www.gofundme.com/get-away-from-him</td></tr><tr><td>4</td><td>Medical</td><td>City4</td><td>December 12, 2018</td><td>6000.0</td><td>Kelvin McCray Recovery Fund</td><td>2450.0</td><td>State2</td><td>https://www.gofundme.com/send-ricky-to-school</td></tr><tr><td>5</td><td>Medical</td><td>City5</td><td>December 11, 2018</td><td>25000.0</td><td>Wheelchair Accessible Vehicle for Taelor</td><td>2270.0</td><td>State2</td><td>https://www.gofundme.com/HelpTaelorTransport</td></tr><tr><td>6</td><td>Medical</td><td>City6</td><td>December 20, 2018</td><td>10000.0</td><td>"A Good friend in dire need"</td><td>1250.0</td><td>State2</td><td>https://www.gofundme.com/4dmeoko</td></tr></tbody></table>

Step 1: Drop unneeded columns:

df.drop(['category', 'created', 'goal', 'name', 'url'], inplace=True, axis = 1)

Which gives us:

    city    raised  state
0   City1   80.0    State1
1   City1   500.0   State1
2   City2   570.0   State1
3   City3   2200.0  State1
4   City4   2450.0  State2
5   City5   2270.0  State2
6   City6   1250.0  State2

Step2: Group by state & city and sum their raised amounts (only city1 in this example):

df = df.groupby(['state', 'city']).sum()

Now we have:

                raised
state   city    
State1  City1   580.0
        City2   570.0
        City3   2200.0
State2  City4   2450.0
        City5   2270.0
        City6   1250.0

Note our index went from being numerical to multilevel. Level 1 being state level 2 being the city.

Step 3: Sort by amount raised. As expected this ignores the multi-index order so after sorting we need to re-sort the index. We only sort by level 0 which is the state. :

df.sort_values('raised', ascending=False).sort_index(level=[0], sort_remaining=False).groupby('state').head(3)

Finally we have:

                raised
state   city    
State1  City3   2200.0
        City1   580.0
        City2   570.0
State2  City4   2450.0
        City5   2270.0
        City6   1250.0
Idodo
  • 1,322
  • 11
  • 18
  • this one seems to be working: a.df.sort_values('raised', ascending=False).sort_index(level=[0], sort_remaining=False).groupby('state').head(3) I am accepting your answer, just add this last line of code from my comment into it, becuase I need to select top "n" cities – Ostap Didenko May 15 '19 at 22:57
  • No problem, amended the answer. – Idodo May 16 '19 at 11:12