2

I have a table in a MySQL database that contains IDs of persons ´pid´ and names of cities city that those people lived in. Since every person could have lived in more than one city during his/her live, one person can have more than one entry in that table.

PID | CITY
1   | Berlin
1   | New York

2   | Berlin
2   | Oslo
2   | New York
2   | Mexiko City

3   | Oslo

4   | Berlin
4   | Oslo
4   | Bagdad
4   | New York
4   | Mexiko City

There are two/three things I'd like to find out:

  1. What are the pid of all those persons, that lived in "Berlin" and in "New York" (the result should be [1, 2, 4])
  2. What are the other cities that people lived in, that once lived in "Berlin" and in "New York" (the result should be ["Oslo", "Mexiko City", "Bagdad"])
  3. It would be even better to get the counts, how often those cities occur in the table. (the result should be ["Oslo" -> 2 , "Mexiko City" -> 2, "Bagdad" -> 1])

How do I have to design my queries, to find out these things (and not overstress my database Server)?

R_User
  • 10,682
  • 25
  • 79
  • 120
  • 1
    http://www.sqlfiddle.com/#!3/645ea – mellamokb Jun 25 '12 at 19:03
  • See this question (with more than 10 different ways to achieve what you want): [How to filter SQL results in a has-many-through relation](http://stackoverflow.com/questions/7364969/how-to-filter-sql-results-in-a-has-many-through-relation) - and benchmarks. – ypercubeᵀᴹ Jun 25 '12 at 19:16
  • You have `Oslo` 3 times in the data. The count of `Oslo` should be 3, not 2. – Zane Bien Jun 25 '12 at 20:02

2 Answers2

2

You can do all of these things in one query:

SELECT
    a.pid,
    COALESCE(b.city, 'No other cities') AS othercity,
    COUNT(c.city) AS OverallCityCount
FROM
    (
        SELECT pid
        FROM tbl
        WHERE city IN ('New York', 'Berlin')
        GROUP BY pid
        HAVING COUNT(*) = 2
    ) a
LEFT JOIN
    tbl b ON a.pid = b.pid AND b.city NOT IN ('New York', 'Berlin')
LEFT JOIN
    tbl c ON b.city = c.city
GROUP BY
    a.pid, 
    b.city

The FROM subselect answers question one, where it only gets the pids where they have lived in BOTH New York AND Berlin.

Since the subselect will only return one row per pid, we must join the resulting pids against the table again to answer question 2 (get all cities lived in for the pids that have lived in both New York and Berlin). If the user has lived in both New York and Berlin, but not any other city, we don't want to exclude that pid, so we use a LEFT JOIN. If there are no other cities, the city field will just be No other cities.

Now to get the count of how many times the city appears in the table overall, regardless of the pid, we self-join on the table again on the city field. When this join happens, the combination of pid -> city will appear as many times as there are that city in the table, and so in order to get the count, we GROUP BY both pid and city, in which COUNT() gives us the count of the number of times the city appears in the table. If there are no other cities that the pid has lived in, this count will just be 0.

Working with your sample data, the result set would look like:

pid | othercity        | OverallCityCount
-----------------------------------------
1   | No other cities  | 0
2   | Oslo             | 3
2   | Mexiko City      | 2
4   | Oslo             | 3
4   | Bagdad           | 1
4   | Mexiko City      | 2

http://www.sqlfiddle.com/#!3/2bb23/2/0

Zane Bien
  • 22,685
  • 6
  • 45
  • 57
2

1) Search for the two cities, and find all pid that match all the cities, by counting the number of records matching cities in the list and checking if it's equal to the total number of cities being matched.

select
  pid
from
  PlacesLived
where
  city in ('Berlin', 'New York')
group by
  pid
having
  count(*) = 2

Demo: http://www.sqlfiddle.com/#!3/645ea/1


2) Query the distinct list of cities lived by the people found in (1), minus the cities that you are validating against (Berlin & New York):

select distinct
  city
from
  PlacesLived
where pid in (
  select
    pid
  from
    PlacesLived
  where
    city in ('Berlin', 'New York')
  group by
    pid
  having
    count(*) = 2
)
and city not in ('Berlin', 'New York')

Demo: http://www.sqlfiddle.com/#!3/645ea/3


3) Add group by and count to get the actual counts:

Demo: http://www.sqlfiddle.com/#!3/645ea/4

mellamokb
  • 56,094
  • 12
  • 110
  • 136