-1

actual data headI am stuck in a following problem, image is my dataframe

The image is my dataframe in which list of state is very long which includes different states of USA as index and other 2 columns has information about counties in it and Census population 2010.

My aim is to only looking at the three most populous counties for each state, what are the three most populous states (in order of highest population to lowest population)? Use CENSUS2010POP. This function should return a list of string values.

df = pd.DataFrame({'State': ['A', 'A','A','A','A','B','B','B','B','B','B','C','C','C','C','C', 'D','D', 'D', 'D'],
               'County': ['Aa', 'Ab','Ac','Ad', 'Ae', 'Ba','Ba','Bb','Bc','Bd','Be','Ca','Cb','Cc','Cd','Ce','Da','Db','Dc','Dd'],
               'Population': [25,35,45,60,12,80,45,60,20,30,14,65,87,65,13,29,45,60,75,80]})
  • Could you post your data frame by e.g. df.head as codes, instead of showing a link? – tianlinhe Apr 09 '20 at 09:29
  • I did it could, let me know if you can. My task is "only looking at the three most populous counties for each state, what are the three most populous states (in order of highest population to lowest population)? Use CENSUS2010POP. This function should return a list of string values." – Brijesh Prajapati Apr 09 '20 at 12:40
  • Please read [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). It makes it easier for us to help you if you post code that we can easily reproduce instead of images. Also, try to use a more descriptive title. – BioGeek Apr 09 '20 at 12:42
  • Hi, Thanks for advice. I am new here so did not know about it. I did it. can you check is it okay? and can you help me with this problem? – Brijesh Prajapati Apr 09 '20 at 18:17

1 Answers1

0
state_group=df.groupby(['State'])['Population'].nlargest(3).sum(level=0)
state_group_largest3=state_group.nlargest(3)

The output of print (state_group), calculate sum of the 3 largest counties in a state:

State
A    140 # because 140=35+45+60, which are the 3 largest counties in A
B    185
C    217
D    215
Name: Population, dtype: int64

The output of print (state_group_nlargest) gives you the three states with highest population.

State
C    217
D    215
B    185
Name: Population, dtype: int64

I think you know all the relevant python functions (in this case groupby, nlargest, sum, sometimes you just have to apply them in a logical way :)

tianlinhe
  • 991
  • 1
  • 6
  • 15
  • It has 3 columns STNAME, CTYNAME, and CENSUS2010POP. And I got the answer for single state but I dont know how to iterate through all the values in STNAME column. and store those summed values in a new column – Brijesh Prajapati Apr 09 '20 at 12:33
  • I understand the task fully. But please post your data as minimally reproducible codes **inside your question**, instead of image or link, so that people can test their answer and better help you. – tianlinhe Apr 09 '20 at 12:59
  • I did it. Can you please make it run if you understood my problem. – Brijesh Prajapati Apr 09 '20 at 18:16
  • Thank you, please see my edited answer. And I also advice you to change your title so that it is more relevant, for example, 'sum of the n largest values in according to column value' (people don't usually click on unspecific title, that's why you have very few responses). – tianlinhe Apr 09 '20 at 18:34
  • Thanks a lot for this solution. I will try and will get back to you. – Brijesh Prajapati Apr 09 '20 at 19:38