0

I want to collect data by application_number like this. The real application number is CN 201510747352.

SELECT c.application_number AS Pub, COUNT(p.publication_number) AS CitedBy 
     FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c 
     WHERE c.application_number IN ('CN-201510747352-A') 
     GROUP BY c.application_number

But it can't work. The url is the patent page. Who can do me a favor? https://patents.google.com/patent/CN105233911B/zh?oq=CN201510747352.8

Taher A. Ghaleb
  • 5,120
  • 5
  • 31
  • 44
LEBRON L
  • 17
  • 6

1 Answers1

1

My guess is that patent can be cited after it's status is Application - so instead of using initial number CN-201510747352 - you should use app/pub number when status is Application - Also you need to apply not only distinct counting but also excluding counting same app with -A or -B or etc. suffix - that is why you will see use of the regex_extract function

#standardSQL
SELECT 
  c.publication_number AS Pub, 
  COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
FROM `patents-public-data.patents.publications` AS p, 
UNNEST(citation) AS c 
WHERE c.publication_number LIKE ('CN-105233911%') 
GROUP BY c.publication_number  

with result

Row Pub             CitedBy  
1   CN-105233911-A  10   

... If I only have the application data, how can I realize it?

#standardSQL
SELECT 
  c.publication_number AS Pub, 
  COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
FROM `patents-public-data.patents.publications` AS p, 
UNNEST(citation) AS c 
WHERE c.publication_number IN (
  SELECT publication_number 
  FROM `patents-public-data.patents.publications`
  WHERE application_number IN ('CN-201510747352-A') 
)
GROUP BY c.publication_number 
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • If I only have the application data, how can I realize it? – LEBRON L Nov 27 '18 at 06:50
  • Would you mind helping me further? Is it hard to be realized? – LEBRON L Nov 27 '18 at 07:49
  • I find there are two application numbers in patent page. one is in priority applications and the other one is in Applications Claiming Priority. The data I gathered is in Applications Claiming Priority. Sometimes it cannot work out, such as ZA200803637A in https://patents.google.com/patent/CN100573476C/fr. Are there any way to solve the problem? – LEBRON L Nov 28 '18 at 01:35
  • I'm sorry for finding something wrong about the order you edited. Take 201020059348 for example, there 8 Citedby , but the order give a wrong answer. What's the reason it happened? – LEBRON L Nov 28 '18 at 07:30
  • link:https://patents.google.com/patent/CN201728346U/zh?oq=CN201020059348 – LEBRON L Nov 28 '18 at 07:30