1

I have a df as follows:

         State  Country   Pop       GDP
   1     MD     USA        433100   1222
   2     TX     USA        99934123 324234
   3     B.C.   MEX        324599   5343
   4     OAX    MEX        43322    986
   5     BC     CAN        3431     643
   6     NS     CAN        8749     3535

I want to group by Country and find the max GDP and min GDP, returning that as well as new column names so that it looks something like this:

      PoorState   GDP    Pop       RichState   GDP      Pop
USA   MD          1222   433100    TX          324234   99934123
MEX   OAX         986    43322     B.C.        5343     324599
... 

I've tried this to get the min and max for each state:

df.groupby('state').agg({'GDP':min, 'GDP':max})[['GDP','GDP']].reset_index()

but just the min works while the max doesn't.

  State   GDP    GDP
  MD      1222   1222
  OAX     986    986

-- I know that I can do this to set the index: newdf = df.set_index('region')

I think I'm trying to figure out if I can do this all on one line.

1 Answers1

0

We can do groupby with rank create the category then pivot

df['new'] =  df.groupby('Country')['GDP'].rank().map({1:'Poor',2:'Rich'})
out = df.pivot(index='Country',columns = 'new').sort_index(level=1,axis=1)
out.columns = out.columns.map('_'.join)
out.reset_index(inplace=True)
out
Out[348]: 
  Country  GDP_Poor  Pop_Poor State_Poor  GDP_Rich  Pop_Rich State_Rich
0     CAN       643      3431         BC      3535      8749         NS
1     MEX       986     43322        OAX      5343    324599       B.C.
2     USA      1222    433100         MD    324234  99934123         TX
BENY
  • 317,841
  • 20
  • 164
  • 234