0

There are two tables:

Table education_data (list of countries with values by year per measured indicator).

create table education_data 
(country_id int,
indicator_id int,
year date,
value float
);

Table indicators (list of all indicators):

create table indicators
(id int PRIMARY KEY,
name varchar(200),
code varchar(25)
);

I want to find the indicators for which the highest number of countries lack information entirely i.e. max (count of missing indicators by country)

I have solved the problem in excel (by counting blanks in a pivot table by country)

pivot table with count for missing indicators by country

I haven't figured our yet the SQL query to return the same results.

I am able to return the number of missing indicators for a set country , read query below, but not for all countries.

SELECT COUNT(*)
FROM education_data AS edu
RIGHT JOIN indicators AS ind ON
edu.indicator_id = ind.id and country_id = 10 
WHERE value IS NULL 
GROUP BY country_id

I have tried with a cross join without success so far.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Basu23
  • 3
  • 2

2 Answers2

0

I want to find the indicators for which the highest number of countries lack information entirely i.e. max (count of missing indicators by country)

That's a logical contradiction. The ...

count of missing indicators by country

.. cannot be pinned on any specific indicators, since those countries don't have an indicator.

The counts per country with "missing indicator" (i.e. indicator_id IS NULL):

SELECT country_id, count(*) AS ct_indicator_null
FROM   education_data
WHERE  indicator_id IS NULL
GROUP  BY country_id
ORDER  BY count(*) DESC;

Or, more generally, without valid indicator, which also includes rows where indicator_id has no match in table indicators:

SELECT country_id, count(*) AS ct_no_valid_indicator
FROM   education_data e 
WHERE  NOT EXISTS (
   SELECT FROM indicators i
   WHERE  i.id = e.indicator_id
   )
GROUP  BY country_id
ORDER  BY count(*) DESC;

NOT EXISTS is one of four basic techniques that apply here (LEFT / RIGHT JOIN, like you tried being another one). See:

You mentioned a country table. Countries without any indicator entries in education_data are not included in the result above. To find those, too:

SELECT *
FROM   country c
WHERE  NOT EXISTS (
   SELECT
   FROM   education_data e
   JOIN   indicators     i ON i.id = e.indicator_id  -- INNER JOIN this time!
   WHERE  e.country_id = c.id
   );

Reports countries without valid indicator (none, or not valid).


If every country should have a valid indicator, after cleaning up existing data, consider:

.. which brings you closer to a valid many-to-many implementation. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

You will have to join on the contries as well, otherwise you can not tell if a contry has no entry in education_data at all:

create table countries(id serial primary key, name varchar);
create table indicators
(id int PRIMARY KEY,
name varchar(200),
code varchar(25)
);
create table education_data 
(country_id int references countries,
indicator_id int references indicators,
year date,
value float
);

insert into countries values (1,'USA');
insert into countries values (2,'Norway');
insert into countries values (3,'France');
insert into indicators values (1,'foo','xxx');
insert into indicators values (2,'bar', 'yyy');
insert into education_data  values(1,1,'01-01-2020',1.1);

SELECT count (c.id), i.id, i.name
FROM countries c JOIN indicators i ON (true) LEFT JOIN education_data e ON(c.id = e.country_id AND i.id = e.indicator_id) 
WHERE indicator_id IS NULL
GROUP BY i.id;


count | id | name 
-------+----+------
     3 |  2 | bar
     2 |  1 | foo
(2 rows)
clamp
  • 2,552
  • 1
  • 6
  • 16
  • absoutely right. That's the right answer. If I wanted to add another condition: only countries with c.is_country field =True how would I change the left outer join ? – Basu23 Jan 26 '20 at 16:33
  • You can add it to the `WHERE` clause or to the (first) `JOIN` condition. – clamp Jan 26 '20 at 17:00
  • I just wanted to double check.it thanks. The question wasn't that difficult but my synapses are burned out. – Basu23 Jan 26 '20 at 17:24