1

I have a problem with the web scraping code below. The code works, but if the entered product is not just a single word and contains for example also a number like "Playstation 4" it fails. The problem seems to be in this line if product in str(product_name):

I tried many different variations like product_name.text or product_name.string, but it won´t correctly check if the string product is in the converted object product_name if it is not just one word.

If I use print(product_name.text) I get exactly the result that I would expect, but why can´t I use the if-in-statement correctly with product_name.text or str(product_name)?

import requests
from bs4 import BeautifulSoup

product = input("Please enter product: ")

URL = "http://www.somewebsite.com/search?sSearch=" + product

website = requests.get(URL)

html = BeautifulSoup(website.text, 'html.parser')

product_info = html.find_all('div', class_="product--main")


product_array = []
for product_details in product_info:
    product_name = product_details.find('a', class_="product--title product--title-desktop")
    if product in str(product_name):
        product_array.append(product_name.text.replace('\n', '')+'; ')
        discounted_price = product_details.find('span', class_="price--default is--discount")
        if discounted_price:
            product_array.append(discounted_price.text.replace('\n', '').replace('\xa0€*','').replace('from','') + ';\n')
        else:
            regular_price = product_details.find('span', class_="price--default")
            product_array.append(regular_price.text.replace('\n', '').replace('\xa0€*','').replace('from','') + ';\n' if regular_price else 'N/A;\n')

with open("data.csv", "w") as text_file:
    text_file.write("product; price;\n")
    for object in product_array:
        text_file.write(object)
Thomas
  • 25
  • 3
  • 1
    please add the values of `product` and `product_name` for the failing test – Krishna Chaurasia Feb 02 '21 at 16:15
  • Check type(product_name) then you will see. You can use iin or not – Elvin Jafarov Feb 02 '21 at 16:16
  • Try removing code to narrow it down to a [mcve]. – Brian McCutchon Feb 02 '21 at 16:18
  • Also, if there are spaces perhaps switch to if re.match(.....): instead as definitely will allow for spaces between words in pattern matching – QHarr Feb 02 '21 at 16:19
  • thx for all the answers... the value of `product` for example is `"Playstation 4"` and the value of `product_name` are scraped product names like `
    Playstation 4
    ` and `
    Playstation 5
    ` and I want to sort out the Playstation 5 result. Using `type(product_name)` returns ``. What do you mean with iin or not? I had stripped all the code after `if product in str(product_name):` and replaced it with a simple print to find my mistake, but I don´t get why it´s not working after the string conversation. Thx for the tip, I will try it with `re.match`.
    – Thomas Feb 02 '21 at 16:58
  • What is the actual url please? – QHarr Feb 02 '21 at 17:05
  • Maybe check for Unicode characters? – Brian McCutchon Feb 02 '21 at 17:36

1 Answers1

1

Why should I use urlencode?

I tried many different variations like product_name.text or product_name.string, but it won´t correctly check if the string product is in the converted object product_name if it is...

not just one word.

URL = "http://www.somewebsite.com/search?sSearch=" + product

Please look what happens with query string when you use concatenation: enter image description here


So please consider updating your code like below: enter image description here

madbird
  • 1,326
  • 7
  • 11