I'm working with the patents-public-data.patents.publications_201710 table in Google's BigQuery. My goal is to identify potential corporate-owned patents. Since an inventor must be an individual and not a corporation, I can theoretically do so by identifying records where the assignee is not found in the list of inventors. The tricky part is that there are often multiple inventors and even multiple assignees for the same publication_number.
The table originally looks like:
Row publication_number assignee inventor
1 US-7011573-B2 Mcarthur Richard C MCARTHUR RICHARD C.
Holmes Dennis G HOLMES DENNIS G.
Mcarthur Ronald J MCARTHUR RONALD J.
2 US-8746747-B2 IPS Corporation—Weld-On Division MCPHERSON TERRY R
I've tried the following (unsuccessful) query:
#standard sql
SELECT
p.publication_number,
assignee,
inventor
FROM
`patents-public-data.patents.publications_201710` AS p,
p.assignee assignee,
p.inventor inventor
WHERE
assignee not in (inventor) #want to run this within each p.publication_number-assignee group somehow
AND p.publication_number IN ('US-8746747-B2',
'US-7011573-B2')
This query produces the following output:
Row publication_number assignee inventor
1 US-7011573-B2 Mcarthur Richard C MCARTHUR RICHARD C.
2 US-7011573-B2 Mcarthur Richard C HOLMES DENNIS G.
3 US-7011573-B2 Mcarthur Richard C MCARTHUR RONALD J.
4 US-7011573-B2 Holmes Dennis G MCARTHUR RICHARD C.
5 US-7011573-B2 Holmes Dennis G HOLMES DENNIS G.
6 US-7011573-B2 Holmes Dennis G MCARTHUR RONALD J.
7 US-7011573-B2 Mcarthur Ronald J MCARTHUR RICHARD C.
8 US-7011573-B2 Mcarthur Ronald J HOLMES DENNIS G.
9 US-7011573-B2 Mcarthur Ronald J MCARTHUR RONALD J.
10 US-8746747-B2 IPS Corporation—Weld-On Division MCPHERSON TERRY R
First, I'd like to see if the assignee is contained in (not just equal to) the inventor variable since assignee omits the "." after the middle initial. Frequently the string in the assignee variable equals the inventor, but not always. Some examples of exceptions are:
- assignee: "Daryl A. KRUPA"; inventor: "KRUPA Daryl A."
- assignee: "KRUPADANAM Gazula Levi DAVID"; inventor: "DAVID KRUPADANAM, Gazula Levi"
I know I'll end up having to live with some false positives (record is actually assigned to an individual and not a corporation), but if I can address some of these issues through my query I'd prefer that.
Second, assuming the first concern is addressed, I can't just see if the assignee is contained in the inventor variable because that would inappropriately retain rows 2-4, 6-8.
My desired output is simply:
Row publication_number assignee inventor
1 US-8746747-B2 IPS Corporation—Weld-On Division MCPHERSON TERRY R
I've found somewhat relevant posts here and here, but they don't fully address my question or don't use SQL.
What is the appropriate standard SQL query for my desired output?