1

I'm new to sql and I'm practicing on a created database in oracle about airbnb listings in amsterdam and berlin. I'm trying to JOIN the Tables HOSTS (host_id, host_name), LISTINGS (which includes all listings in the two city with the attributes listings_id, listings_name, Price, host_id...), Neighbourhoods (Neighbourhood_Group, City and Neighbourhood) and reviews (with review_id, listings_id as a foreign key, reviewer_id, reviewer_name and comment).

Now i want to write a query, which returns the average price, lowest price, highest price, the city (Berlin or Amsterdam), the neighbourhood (Centrum, Alexanderplatz...), the amount of listings and the amount of reviews, all grouped by the distinct neighbourhood, and by the WHERE statement, that limits the returns to those listings, that are hosted by Hosts which have less than 3 listings total.

Now if I run the query without the reviews table and only order it by amount_listings, I get the correct amount of listings per neighbourhood for the column "amount_listings".


SELECT avg(l.price) as Mean_Price, 
n.city, 
n.neighbourhood, 
count (l.listings_id) as amount_listings,
min(l.price),
max(l.price)
    FROM listings l
    INNER JOIN neighborhood n
    ON l.neighbourhood = n.neighbourhood

    INNER JOIN hosts h
    ON l.host_id = h.host_id

WHERE h.host_id IN (
    SELECT host_id
    FROM listings
    GROUP BY host_id
    HAVING count(host_id) < 3
)
GROUP BY n.neighbourhood, n.city
ORDER BY amount_listings DESC;

enter image description here

But if i include the amount of reviews in the query, the results are incorrect. The amount of reviews and the amount of listings column show the wrong amount (too much).

SELECT avg(l.price) as Mean_Price, count(l.listings_id) as amount_listings, 
       min(l.price), max(l.price), n.city, n.neighbourhood, count(r.review_id) as amount_reviews
FROM listings l
INNER JOIN neighborhood n
ON l.neighbourhood = n.neighbourhood

INNER JOIN hosts h
ON l.host_id = h.host_id

INNER JOIN reviews r
ON l.listings_id= r.listings_id
WHERE h.host_id IN (
    SELECT host_id
    FROM listings
    GROUP BY host_id
    HAVING count(host_id) < 3
)

GROUP BY n.neighbourhood, n.city
ORDER BY amount_listings DESC, amount_reviews DESC;

enter image description here

I dont know why the amount_listings and amount_reviews return such wrong results.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
  • 1
    This seems likely to be a common error where people want some joins, each possibly involving a different key, of some subqueries, each possibly involving join and/or aggregation, [but](https://stackoverflow.com/a/45252750/3404097) they wrongly try to do all the joining then all the aggregating or to aggregate over previous aggregations. PS Please [use text, not images/links, for text--including tables & ERDs](https://meta.stackoverflow.com/q/285551/3404097). Use images only for what cannot be expressed as text or to augment text. Include a legend/key & explanation with an image. – philipxy Dec 15 '19 at 12:16
  • 1
    Please in code questions give a [mre]--cut & paste & runnable code; example input (as initialization code) with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For errors that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL that includes DBMS & DDL, which includes constraints & indexes & tabular initialization. – philipxy Dec 15 '19 at 12:16

2 Answers2

0

The cause is that since you are adding a new table with more than 0 rows, records are counted several times, depending on how many reviews are. Since you are only interested in the count of reviews, let's join that instead with a nice old trick:

SELECT avg(l.price) as Mean_Price, count(l.listings_id) as amount_listings, 
       min(l.price), max(l.price), n.city, n.neighbourhood, r.review_count as amount_reviews
FROM listings l
INNER JOIN neighborhood n
ON l.neighbourhood = n.neighbourhood

INNER JOIN hosts h
ON l.host_id = h.host_id

INNER JOIN (select reviews.listings_id as listings_id, count(*) as review_count from reviews where reviews.listings_id = l.listings_id) r
ON l.listings_id= r.listings_id
WHERE h.host_id IN (
    SELECT host_id
    FROM listings
    GROUP BY host_id
    HAVING count(host_id) < 3
)

GROUP BY n.neighbourhood, n.city
ORDER BY amount_listings DESC, amount_reviews DESC;
Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
0

Aggregate before joining. You want to join the review count to each listing and then the aggregated listing information to the neighbourhood.

select
  lr.mean_price,
  n.city,
  n.neighbourhood,
  lr.amount_listings,
  lr.min_price,
  lr.max_price,
  lr.amount_reviews
from neighborhood n
join
(
  select
    l.neighbourhood,
    min(l.price) as min_price,
    max(l.price) as max_price,
    avg(l.price) as mean_price,
    count(*) as amount_listings,
    coalesce(sum(r.reviews_for_listing), 0) as amount_reviews
  from listings l
  left join
  (
    select
      listings_id,
      count(*) as reviews_for_listing
    from reviews
    group by listings_id
  ) r on r.listings_id = l.listings_id
  where l.host_id in
  (
    select host_id
    from listings
    group by host_id
    having count(*) < 3
  )
  group by l.neighbourhood
) lr on lr.neighbourhood = n.neighbourhood
order by n.city, n.neighbourhood;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73