1

I am fairly new to SQL, working on some practice problems. I have a sample Twitter database and I am trying to find the top 3 users in each location based on number of followers.

Here are the tables I am using:

id_follower_location

        id       | followers | location 
-----------------+-----------+----------
 id28929238      |         1 | Toronto
 id289292338     |         1 | California
 id2892923838    |         2 | Rome
 .
 .

locations

           location       
----------------------
 Bay Area, California
 London
 Nashville, TN
.
.

I've been able to find the "top" user by:

create view top1id as 
  select location, 
    (select id_followers_location.id from id_followers_location 
      where id_followers_location.location = locations.location 
      order by followers desc limit 1
    ) as id 
  from locations;

create view top1 as 
  select location, id, 
    (select followers from id_followers_location 
      where id_followers_location.id = top1id.id
    ) as followers 
  from top1id;

The only way I have been able to come up with so solve this is by finding out the "Top 1st", the "Top 2nd", the "Top 3rd" and then using union to combine it. Is that the correct/only way of doing this? Or is there a better way?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Can a user have more than one location? – FoolishSeth Apr 14 '13 at 03:02
  • 1
    @FoolishSeth yes to postgres and no to user have more than one location. Thank you –  Apr 14 '13 at 03:02
  • 1
    Like with most questions of this kind, you forgot to mention how to deal with **ties**? What do you want to return when 5 people share the same number of followers? Do you want *exactly* three rows per location? How to decide? Random? Arbitrary? Additional criteria? – Erwin Brandstetter Apr 14 '13 at 13:45

2 Answers2

3

Top n

With rank() you get at least 3 rows (fewer if fewer exist). If there are ties among the top 3 ranks, more rows may be returned. See:

If you want exactly 3 rows per location (fewer if fewer exist), you have to break ties. One way is to use row_number() instead of rank().

SELECT *
FROM (
   SELECT id, location
        , row_number() OVER (PARTITION BY location ORDER BY followers DESC) AS rn
   FROM   id_follower_location
   ) r
WHERE  rn <= 3
ORDER  BY location, rn;

You may want to add ORDER BY to the outer query to guarantee sorted output.
If there are more than three valid candidates, you get an arbitrary pick from ties - unless you add more ORDER BY items in the OVER clause to break ties.

Top 1

As for your query to get the top 1 row: there is a much simpler and faster way in PostgreSQL:

SELECT DISTINCT ON (location)
       id, location           -- add additional columns freely
FROM   id_follower_location
ORDER  BY location, followers DESC;

Details for this query technique in this closely related answer:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

You can do this with windowing functions: http://www.postgresql.org/docs/9.1/static/tutorial-window.html

For example (not tested may need slight syntax fixes):

SELECT follower_ranks.id, follower_ranks.location 
FROM (
    SELECT id, location, 
      RANK() OVER (PARTITION BY location ORDER BY followers DESC) 
    FROM id_follower_location
) follower_ranks 
WHERE follower_ranks.rank <= 3;
FoolishSeth
  • 3,953
  • 2
  • 19
  • 28