0

Having trouble writing a query where I can get the top 10 of a top 10 based on a count

My starting table from this query:

top_10_cars = 'CH', 'DA', 'AG', 'DC', 'LA', 'NY', 'SA', 'SE', 'DE', 'MI'

df = pd.read_sql("select\
                    count(*) as count\
                    ,ID\
                    ,CAR\
                    from "+db+"\
                    where pop in ('"+ "','".join(top_10_cars) +"')\
                    group by\
                    pop\
                    ,asn\
                    order by reqs desc\
                     ",conn)

Result is a list with all the IDs for every car grouping sorted by count:

Count       ID  CAR
67210048    7922    CH
2081655     20001   LA
488850583   7018    AG
567585985   7018    DA
450991      7922    SA
41123124    7018    CH
4135532     11427   DA
...
..
.

The dataframe above is too big. I only one that top 10 Ids for each car.

For example CH:

Count       ID  CAR
67210048    7922    CH
25100548    7546    CH
465100      8542    CH
67254828    5622    CH
1251048     3522    CH
...
..
.

The resulting table should look like this

Count       ID  CAR
67210048    7922    CH
25100548    7546    CH
..
.
7210048     1546    DA
251005      5678    DA
25100548    7546    DA
465100      8542    DA
...
..
67254828    5622    DA
and 
so 
on.. 'AG', 'DC', 'LA', 'NY', 'SA', 'SE', 'DE', 'MI' 
chowpay
  • 1,515
  • 6
  • 22
  • 44
  • This answer might help- https://stackoverflow.com/questions/20069009/pandas-get-topmost-n-records-within-each-group – Ankit Agarwal Jul 07 '18 at 05:47
  • @AnkitAgarwal trying to do that in the query.. but I ended up with a .. pythonic solution (open to a sql though) – chowpay Jul 07 '18 at 05:59

1 Answers1

0

This is probably not the correct way to do this but I just wrapped it some python:

df = pd.DataFrame()

for x in top_NA_cars:
    dftemp = pd.read_sql("select\
                            count(*) as count\
                            ,ID\
                            ,CAR\
                            from "+db+"\
                            where CAR in ('"+x+"')\
                            group by\
                            ID\
                            ,CAR\
                            order by count desc limit 10",conn)
    df = df.append(dftemp)

Open to better solutions but ^ did work.

chowpay
  • 1,515
  • 6
  • 22
  • 44