I need to join two tables: our_sample
and tls207_pers_appln
from PATSTAT.
our_sample
' has 4 columns: appln_id, appln_auth, appln_nr, appln_kind.
tls207_pers_appln
has 4 columns: appln_id, person_id, applt_seq_nr, invt_seq_nr.
our_sample
has 2191 rows and some (60) of them are missing in tls207_pers_appln
.
Because I want to join the tables keeping all the appln_id in our_sample
(even if they do not have the matching information from tls207_pers_appln
) I join the two tables doing a RIGHT JOIN.
However, the resulting view t2_tot_in_patent
has only 2096 appln_id.
This is in part due to the restriction I put (35 patents are dropped because I select only those HAVING MAX(invt_seq_nr) > 0
, which is fine). But this would yield 2191-35 = 2156 patents.
Instead, I get 2096 of them that is: 2191 (in our_sample) - 60 (appln_ids in our_sample that miss from tls207) - 35 (appln_ids for which invt_seq_nr = 0).
BUT the whole point of using RIGHT JOIN is that I should not loose those 60 patents. Why then?
-- compiling total count of inventors per patent: t2_tot_in_patent
DROP VIEW IF EXISTS t2_tot_in_patent;
CREATE VIEW t2_tot_in_patent AS
SELECT m.appln_id, MAX(invt_seq_nr) AS tot_in_patent
FROM patstat2022a.tls207_pers_appln AS t7
RIGHT OUTER JOIN cecilia.our_sample AS m
ON t7.appln_id = m.appln_id
GROUP BY appln_id
HAVING MAX(invt_seq_nr) > 0