0

I am looking at PatentsView API and it's unclear how to retrieve the full text of a patent. It contains only the detail_desc_length not the actual detailed description.

I would like to preform the following on both the patent_abstract and the "detailed_desciption".

import httpx
url = 'https://api.patentsview.org/patents/query?q={"_and": [{"_gte":{"patent_date":"2001-01-01"}},{"_text_any":{"patent_abstract":"radar aircraft"}},{"_neq":{"assignee_lastknown_country":"US"}}]}&o:{"per_page": 1000}'
r=httpx.get(url)
r.json()
0x90
  • 39,472
  • 36
  • 165
  • 245
  • From the first paragraph of API FAQ page [here](https://patentsview.org/apis/api-faqs) it seems that this API provides metadata instead of full patent. Maybe you can extract the parameters like patent number from this API and use Google Patents to retrieve the PDF with full text. – Divyesh Peshavaria Aug 24 '21 at 06:53
  • Given a patent ID from PatentsView (e.g. 10005543) you can do this:- https://patents.google.com/patent/US10005543B2/en?oq=10005543 –  Aug 24 '21 at 07:16
  • @DarkKnight is there an API that can retrieve it? I want to do stuff as automated and clean as possible – 0x90 Aug 24 '21 at 07:18
  • That _is_ an API, albeit a crude one. – tripleee Aug 24 '21 at 07:28
  • There may be a specific API for Google patents - I don't know. However, with the aid of BeautifulSoup you could analyse the response from the Google URL example I showed and extract the description –  Aug 24 '21 at 07:31
  • @DarkKnight true. Yet for future visitors of this page. It's better to work directly with API and not trying to work directly with google patent UI. One can use BigQuery for instance. But for simple projects it's indeed sufficent. If one wants to read more about bs4 and google patent please refer to: https://stackoverflow.com/questions/64097675/google-patents-scraping-with-beautiful-soup – 0x90 Aug 24 '21 at 08:08

1 Answers1

0

You should take a look at patent_client! It's a python module that searches the live USPTO and EPO databases using a Django-style API. The results from any query can then be cast into pandas DataFrames or Series with a simple .to_pandas() call.

from patent_client import Patent

result = Patent.objects.filter(issue_date__gt="2001-01-01", abstract="radar aircraft")

# That provides an iterator of Patent objects that match the query.
# You can grab abstracts and detailed descriptions like this:

for patent in result:
    patent.abstract
    patent.description

# or you can just load it up in a Pandas dataframe:

result.values("publication_number", "abstract", "description").to_pandas()

# Produces a Pandas dataframe with columns for the patent number, abstract, and description.


A great place to start is the User Guide Introduction

Patent Client Logo

PyPI | GitHub | Docs

(Full disclosure - I'm the author and maintainer of patent_client)

Parker Hancock
  • 111
  • 1
  • 2