0

I need to get a big amount of publication numbers from Google Patents. The example of names that I need: US7863316B2, KR102121633B1. I was trying to scrape the data by using classic Python tools (like BeautifulSoup) but this method doesn't work with Google. Then I went to Google Cloud BigQuery and I've got some results. But before understanding well how to use this platform I've got an error: Quota exceeded: Your project exceeded quota for free query bytes scanned. The code I was using to get data:


  q = r'''
  WITH 
  pubs as (
    SELECT DISTINCT 
      pub.publication_number
    FROM `patents-public-data.patents.publications` pub
      INNER JOIN `patents-public-data.google_patents_research.publications` gpr ON
        pub.publication_number = gpr.publication_number
    WHERE 
      "epilepsy" IN UNNEST(gpr.top_terms)
      AND pub.grant_date < 20000101
  )

  SELECT
    publication_number, url
  FROM 
    `patents-public-data.google_patents_research.publications`
  WHERE
    publication_number in (SELECT publication_number from pubs)
    AND RAND() <= 1000/(SELECT COUNT(*) FROM pubs)
  '''

  return q

df = client.query(create_query(search_term)).to_dataframe()

if len(df) == 0:
  raise ValueError('No results for your search term. Retry with another term.')
else:
  print('Search complete for search term: \"{}\". {} random assets selected.'
  .format(search_term, len(df)))

embedding_dict = dict(zip(df.publication_number.tolist(), 
                          df.embedding_v1.tolist()))

df.head()```

Probably there are some other ways to get information I need?
Alona
  • 67
  • 1
  • 10
  • 1
    Welcome to Stackoverflow! What's your question? – Stef Aug 27 '20 at 13:34
  • 1
    You get 1TB free per month on the free tier. Anything over that and you'll have to start paying i.e. enter your credit card details. – Graham Polley Aug 27 '20 at 13:38
  • 1
    Does this answer your question? [Big Query Error: Your project exceeded quota for free query bytes scanned](https://stackoverflow.com/questions/37173062/big-query-error-your-project-exceeded-quota-for-free-query-bytes-scanned) – Graham Polley Aug 27 '20 at 13:39
  • @Stef thank you! My question is: Probably there are some other ways to get information that I need? Am I write the query correctly? – Alona Aug 27 '20 at 14:08
  • @Graham Polley Yes, I know. But what is one query? Request to scrape one patent? I don't believe that I made request to get 1Tb of data.. And I don't need all patent. My target is to get only patent's numbers. Is my query correct for that? – Alona Aug 27 '20 at 14:13

0 Answers0