I'm currently trying to use some automation while performing a patent searching task. I'd like to get all the links corresponding to search query result. Particularly, I'm interested in Apple patents starting from the year 2015. So the code is the next one -
import selenium
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as options
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
new_driver_path = r"C:/Users/alexe/Desktop/Apple/PatentSearch/geckodriver-v0.30.0-win64/geckodriver.exe"
ops = options()
serv = Service(new_driver_path)
browser1 = selenium.webdriver.Firefox(service=serv, options=ops)
browser1.get("https://patents.google.com/?assignee=apple&after=priority:20150101&sort=new")
elements = browser1.find_elements(By.CLASS_NAME, "search-result-item")
links = []
for elem in elements:
href = elem.get_attribute('href')
if href:
links.append(href)
links = set(links)
for href in links:
print(href)
And the output is the next one -
https://patentimages.storage.googleapis.com/ed/06/50/67e30960a7f68d/JP2021152951A.pdf
https://patentimages.storage.googleapis.com/86/30/47/7bc39ddf0e1ea7/KR20210106968A.pdf
https://patentimages.storage.googleapis.com/ca/2a/bc/9380e1657c2767/US20210318798A1.pdf
https://patentimages.storage.googleapis.com/c1/1a/c6/024f785fd5ea10/AU2021204695A1.pdf
https://patentimages.storage.googleapis.com/b3/19/cc/8dc1fae714194f/US20210312694A1.pdf
https://patentimages.storage.googleapis.com/e6/16/c0/292a198e6f1197/AU2021218193A1.pdf
https://patentimages.storage.googleapis.com/3e/77/e0/b59cf47c2b30a1/AU2021212005A1.pdf
https://patentimages.storage.googleapis.com/1b/3d/c2/ad77a8c9724fbc/AU2021204422A1.pdf
https://patentimages.storage.googleapis.com/ad/bc/0f/d1fcc65e53963e/US20210314041A1.pdf
The problem here is that I've got 1 missing link -
result item and the missing link
So I've tried different selectors and still got the same result - one link is missing. I've also tried to search with different parameters and the pattern is the next one - all the missing links aren't linked with pdf output. I've spent a lot of time trying to figure out what's the reason, so I would be really grateful If you could provide me with any clue on the matter. Thanks in advance!