pandas query a grouping within a grouping

Question

Having trouble writing a query where I can get the top 10 of a top 10 based on a count

My starting table from this query:

top_10_cars = 'CH', 'DA', 'AG', 'DC', 'LA', 'NY', 'SA', 'SE', 'DE', 'MI'

df = pd.read_sql("select\
                    count(*) as count\
                    ,ID\
                    ,CAR\
                    from "+db+"\
                    where pop in ('"+ "','".join(top_10_cars) +"')\
                    group by\
                    pop\
                    ,asn\
                    order by reqs desc\
                     ",conn)

Result is a list with all the IDs for every car grouping sorted by count:

Count       ID  CAR
67210048    7922    CH
2081655     20001   LA
488850583   7018    AG
567585985   7018    DA
450991      7922    SA
41123124    7018    CH
4135532     11427   DA
...
..
.

The dataframe above is too big. I only one that top 10 Ids for each car.

For example CH:

Count       ID  CAR
67210048    7922    CH
25100548    7546    CH
465100      8542    CH
67254828    5622    CH
1251048     3522    CH
...
..
.

The resulting table should look like this

Count       ID  CAR
67210048    7922    CH
25100548    7546    CH
..
.
7210048     1546    DA
251005      5678    DA
25100548    7546    DA
465100      8542    DA
...
..
67254828    5622    DA
and 
so 
on.. 'AG', 'DC', 'LA', 'NY', 'SA', 'SE', 'DE', 'MI'

This answer might help- https://stackoverflow.com/questions/20069009/pandas-get-topmost-n-records-within-each-group — Ankit Agarwal, Jul 07 '18 at 05:47
@AnkitAgarwal trying to do that in the query.. but I ended up with a .. pythonic solution (open to a sql though) — chowpay, Jul 07 '18 at 05:59

score 0 · Answer 1 · answered Jul 07 '18 at 06:01

This is probably not the correct way to do this but I just wrapped it some python:

df = pd.DataFrame()

for x in top_NA_cars:
    dftemp = pd.read_sql("select\
                            count(*) as count\
                            ,ID\
                            ,CAR\
                            from "+db+"\
                            where CAR in ('"+x+"')\
                            group by\
                            ID\
                            ,CAR\
                            order by count desc limit 10",conn)
    df = df.append(dftemp)

Open to better solutions but ^ did work.

pandas query a grouping within a grouping

1 Answers1