0

I'm a student using sakila dvd rental database to do a project, using PostgreSQL 15.2 and pgAdmin4. After some other work, I have this code populating a table that is going to compare rental duration and frequency among categories:

INSERT INTO summary (store_id, cat_group, total_rentals, total_titles, avg_rental_duration) 
    SELECT DISTINCT detailed.store_id, cat_group_fx(detailed.category_name), 
        (SELECT COUNT (detailed.rental_id)
             FROM detailed, summary AS selfsum
            WHERE selfsum.store_id = detailed.store_id
             AND cat_group_fx(detailed.category_name) = selfsum.cat_group
            GROUP BY detailed.store_id, selfsum.cat_group),  
        (SELECT COUNT (inventory.inventory_id)
             FROM inventory, detailed, summary AS selfsum
            WHERE inventory.film_id = detailed.film_id
             AND inventory.store_id = detailed.store_id
            AND detailed.store_id = selfsum.store_id
             AND cat_group_fx(detailed.category_name) = selfsum.cat_group
            GROUP BY detailed.store_id, selfsum.cat_group), 
        AVG(rental.return_date - rental.rental_date) 
    FROM detailed, rental 
    GROUP BY detailed.store_id, detailed.category_name;

When I run the entire code (some 60 lines) I get entire columns full of [null] for total_rentals and total_titles in my summary table. example

But when I highlight and run each individual SELECT COUNT statement by itself, it miraculously counts and the counts appear correct, images below. total_rentals count example total_titles count example

Why is this happening & how can I get the counts into the main table?

More info edit: cat_group_fx is a user function that sorts the rentals into categories by genre, using IN to compare and CASE to sort. "detailed" is another table much farther up the code that simply gathers data from the various Sakila tables, including rental_id, film_id, store_id, category.name, rental_date and return_date. The point of the "detailed" table is to gather and coordinate the relevant granular data that is then transformed into the "summary" table shown here using aggregates and user functions. It is part of the assignment.

  • You typically GROUP BY the same columns as you SELECT - except those who are arguments to aggregate functions. – jarlh Apr 14 '23 at 14:04
  • No need for SELECT DISTINCT here, your GROUP BY has already eliminated the duplicates. – jarlh Apr 14 '23 at 14:05
  • if I remove DISTINCT from the SELECT I get this error: ERROR: duplicate key value violates unique constraint "summary_pkey" DETAIL: Key (store_id, cat_group)=(2, Adult) already exists. SQL state: 23505 – jerry_lemon Apr 14 '23 at 14:16
  • If you run just the SELECT, does the result make any sense? – jarlh Apr 14 '23 at 14:35
  • Yes, the last two links in my post are images of the SELECTs running alone. The counts add up to the total and make sense that way, it just won't run as part of the whole function. – jerry_lemon Apr 14 '23 at 14:43
  • Hi - I have a few thoughts/comments: 1) I'd start be learning ANSI SQL join syntax - the join syntax you are using hasn't been good practice for 30+ years and is much harder to read/debug 2) What is cat_group_fx and what does it do? Is it a function you've written? 3) Your SELECT COUNT queries appear to be correlated subqueries but you haven't linked them back to a table in the main query (i.e. they aren't correlated). When you run them in isolation they're fine but they won't work in the larger SQL statement as they will return more than one value for each row in the main SQL statement – NickW Apr 14 '23 at 15:10
  • @NickW Thanks, I will look into ANSI SQL. cat_group_fx is a function I wrote that sorts the dvds into three categories based on whether the genre is kid-friendly or not, and it seems to work just fine as far as I can figure. It uses IN to compare genre data and sorts using CASE; it's pretty simple. As for your last advice, I'm not sure exactly what you mean by the columns not being linked to a table. They're meant to be subqueries in the first SELECT statement that populates the table called summary as shown in the pic. What can I do to tie them together? Thanks again! – jerry_lemon Apr 14 '23 at 15:18
  • The fact that the individual `SELECT COUNT...` subqueries work alone indicates that they are not correlated to the main query. The lack of explicit join criteria between _detailed_ and _rental_ is equivalent to `CROSS JOIN` in the ANSI JOIN syntax. There is no selection criteria on _rental_ anywhere in the query. What is _detailed_? It doesn't appear to be part of _Sakila DVD Rental Database_. – JohnH Apr 14 '23 at 16:20
  • "detailed" is another table I had to make for the project. It gathers rental_id, film_id, store_id, category.name, rental_date and return_date from various Sakila dvd tables. It's intended to gather and organize the granular data that is then transformed into the "summary" table using aggregates and user functions. I reference it rather than the main tables when possible because I think it's what the evaluator will want to see. Sorry, I should have put this info in the main question, I'll edit it. I'm just frustrated with it and not having my instructor to help as he's out of office. – jerry_lemon Apr 14 '23 at 16:31

1 Answers1

0

Each “column” in a select statement needs to return a single value per row in the resultset. If the column is calculated then that calculation needs to return a single value.

If the calculation involves another select statement (rather than, for example, a simple addition) then that is a subquery.

The subquery is “correlated” if it is linked back to the main query e.g.

SELECT T1.ROW_ID, T1.NAME
,(SELECT COUNT(1) 
  FROM TABLE2 T2 
  WHERE T2.PARENT_ID = T1.ROW_ID
) COUNT_T2
FROM TABLE1 T1

The subquery is correlated back to the main query using WHERE T2.PARENT_ID = T1.ROW_ID. So for each row in T1, the count of matching records in T2 is calculated.

You can have non-correlated subqueries e.g.

SELECT T1.ROW_ID, T1.NAME
,(SELECT COUNT(1) 
  FROM TABLE2 T2 
) COUNT_T2
FROM TABLE1 T1

In this case, every record in the resultset will have the same value in the COUNT_T2

NickW
  • 8,430
  • 2
  • 6
  • 19