0

I have posted a similar question several times but it was closed or redirected to another post that wasn't answering my question. I hope this time this post stays.

I have a df with US census data. I grouped states with their correspondent counties. There's also another column with population ordered from high to low. The only thing I am trying to do is to slice it so I only get the three first most populous counties for each state. The final result should display the three most populous states based on their three most populous counties.

Here's so far my code:

def answer_six():
    cdf = census_df[census_df['SUMLEV'] == 50]
    columns_to_keep = ['STNAME', 'CTYNAME', 'CENSUS2010POP']
    cdf = cdf[columns_to_keep]
    cdf = cdf.sort_values('CENSUS2010POP', ascending=False)
    cdf = cdf.groupby('STNAME')
    cdf = cdf.apply(pd.DataFrame.sort_values, 'CENSUS2010POP', ascending=False).head(100)
#    cdf = [i for i in cdf['STNAME'][:3] if all(cdf['STNAME']) == all(cdf['STNAME'])]
    return cdf
answer_six()

Here's a sample of my data:

    STNAME  CTYNAME           CENSUS2010POP             
37  Alabama Jefferson County    658466
49  Alabama Mobile County   412992
45  Alabama Madison County  334811
51  Alabama Montgomery County   229363
59  Alabama Shelby County   195085
63  Alabama Tuscaloosa County   194656
2   Alabama Baldwin County  182265
41  Alabama Lee County  140247
52  Alabama Morgan County   119490
8   Alabama Calhoun County  118572
28  Alabama Etowah County   104430
35  Alabama Houston County  101547
48  Alabama Marshall County 93019
39  Alabama Lauderdale County   92709
58  Alabama St. Clair County    83593
42  Alabama Limestone County    82782
61  Alabama Talladega County    82291
22  Alabama Cullman County  80406
26  Alabama Elmore County   79303
25  Alabama DeKalb County   71109
64  Alabama Walker County   67023
5   Alabama Blount County   57322
1   Alabama Autauga County  54571
17  Alabama Colbert County  54428
36  Alabama Jackson County  53227
57  Alabama Russell County  52947
23  Alabama Dale County 50251
16  Alabama Coffee County   49948
24  Alabama Dallas County   43820
11  Alabama Chilton County  43643
... ... ... ... ...
80  Alaska  Kenai Peninsula Borough 55400
79  Alaska  Juneau City and Borough 31275
72  Alaska  Bethel Census Area  17013
Barmar
  • 741,623
  • 53
  • 500
  • 612
Frank Jimenez
  • 319
  • 1
  • 7
  • 1
    If your post was closed, you should edit the post to resolve the reason it was closed, and then request that it be reopened, rather than post a new question. – Barmar Jun 02 '20 at 22:59

1 Answers1

0

I am guessing what you are looking for is cdf.groupby('STNAME').head(3) after you sort the cdf?

P.S. perhaps your questions keep getting closed because of duplicate questions? like: Pandas get topmost n records within each group

CtrlMj
  • 119
  • 7
  • Hi, thanks for the advice. I checked the post but it didn't help. Neither the code "cdf.groupby('STNAME').head(3)" works. It simply doesn't do anything :( – Frank Jimenez Jun 03 '20 at 05:10