4

I have a database with ISBN numbers of different books. I gathered them using Python and Beautifulsoup. Next I would like to add categories to the books. There is a standard when it comes to book categories. A website called https://www.bol.com/nl/ has all the books and categories according to the standard.

Start URL: https://www.bol.com/nl/

ISBN: 9780062457738

URL after search: https://www.bol.com/nl/p/the-subtle-art-of-not-giving-a-f-ck/9200000053655943/

HTML class of categories: <li class="breadcrumbs__item"

Does anyone know how to (1) enter the ISBN value in the search bar, (2) then submit the search query and use the page for scraping?

Step (3) scraping all the categories is something I can do. But I don't know how to do the first 2 steps.

Code that I have so far for step (2)

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

webpage = "https://www.bol.com/nl/" # edit me
searchterm = "9780062457738" # edit me

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(webpage)

sbox = driver.find_element_by_class_name("appliedSearchContextId")
sbox.send_keys(searchterm)

submit = driver.find_element_by_class_name("wsp-search__btn  tst_headerSearchButton")
submit.click()

Code that I have so far for step (3)

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.bol.com/nl/p/the-subtle-art-of-not-giving-a-f-ck/9200000053655943/')

soup = BeautifulSoup(data.text, 'html.parser')

categoryBar = soup.find('ul',{'class':'breadcrumbs breadcrumbs--show-last-item-small'})

for category in categoryBar.find_all('span',{'class':'breadcrumbs__link-label'}):
    print(category.text)
T. de Jong
  • 91
  • 1
  • 1
  • 6
  • what are your code trials and what error you getting? – Dev Sep 21 '19 at 12:24
  • @Dev I dont get any errors. I just dont know where to start. The code from (2) is form the internet but I dont know how to use webdriver properly. Do you know how to do this? – T. de Jong Sep 21 '19 at 13:34
  • This question is great and shows a lot of prior work. Yet I think you could have split it up into two different questions, where you ask for step 1 and 2 separately. – saQuist May 10 '22 at 09:18

2 Answers2

4

You can use selenium to locate the input box and loop over your ISBNs, entering each:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
d = webdriver.Chrome('/path/to/chromedriver')
books = ['9780062457738']
for book in books:
  d.get('https://www.bol.com/nl/')
  e = d.find_element_by_id('searchfor')
  e.send_keys(book)
  e.send_keys(Keys.ENTER)
  #scrape page here 

Now, for each book ISBN in books, the solution will enter the value into the search box and load the desired page.

Ajax1234
  • 69,937
  • 8
  • 61
  • 102
1

You could write a function which returns the category. You can base it on the actual search the page does just tidy up the params and you can use a GET.

import requests
from bs4 import BeautifulSoup as bs

def get_category(isbn): 
    r = requests.get(f'https://www.bol.com/nl/rnwy/search.html?Ntt={isbn}&searchContext=books_all') 
    soup = bs(r.content,'lxml')
    category = soup.select_one('#option_block_4 > li:last-child .breadcrumbs__link-label')

    if category is None:
        return 'Not found'
    else:
        return category.text

isbns = ['9780141311357', '9780062457738', '9780141199078']

for isbn in isbns:
    print(get_category(isbn))
QHarr
  • 83,427
  • 12
  • 54
  • 101