2

I'm learning python and bs4.

Following some suggestions and many websites I wrote this script:

import requests as rq
from bs4 import BeautifulSoup

header = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'}

def get_price(site):
    html = rq.get(site, headers=header).text
    soup = BeautifulSoup(html, 'html.parser')
    try:
        price = soup.find(id="priceblock_ourprice").get_text()
        print(site)
        print(price)
    except:
        print(site)
        print("failed")

sites = ["https://www.amazon.in/Apple-iPhone-11-64GB-Green/dp/B07XVKBY68/ref=sr_1_7?keywords=iphone+11&qid=1573668357&sr=8-7",
        "https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19", 
        "https://www.amazon.it/Apple-iPhone-11-128GB-Verde/dp/B07XS5MSW4/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNlhGMElFNUhJMTBJJmVuY3J5cHRlZElkPUEwMTI2OTMxMVpXWEtHQ1o5S0ZENCZlbmNyeXB0ZWRBZElkPUEwOTMyMTczMVdMMzlQOTRPTUE3SCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU=" ]

for site in sites:
    get_price(site)
    print("\n")

I run it and get this:

https://www.amazon.in/Apple-iPhone-11-64GB-Green/dp/B07XVKBY68/ref=sr_1_7?keywords=iphone+11&qid=1573668357&sr=8-7
₹ 64,499.00

https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19
failed

https://www.amazon.it/Apple-iPhone-11-128GB-Verde/dp/B07XS5MSW4/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNlhGMElFNUhJMTBJJmVuY3J5cHRlZElkPUEwMTI2OTMxMVpXWEtHQ1o5S0ZENCZlbmNyeXB0ZWRBZElkPUEwOTMyMTczMVdMMzlQOTRPTUE3SCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU=
749,00 €

I cannot figure out why the second site should not work

The string priceblock_ourprice is present:

$ wget -q -O - 'https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19' 2>&1 | grep \"priceblock_ourprice\"
<span id="priceblock_ourprice" class="a-size-medium a-color-price priceBlockBuyingPriceString">629,00 €</span>
mastupristi
  • 1,240
  • 1
  • 13
  • 29

1 Answers1

3

The problem is that amazon serves HTML that html.parser cannot parse correctly. The solution is to use lxml or html5lib parser:

import requests as rq
from bs4 import BeautifulSoup


header = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'}

def get_price(site):
    html = rq.get(site, headers=header).text
    soup = BeautifulSoup(html, 'lxml')      # <--- use 'lxml' or 'html5lib' parser
    try:
        price = soup.find(id="priceblock_ourprice").get_text()
        print(site)
        print(price)
    except:
        print(site)
        print("failed")

sites = ["https://www.amazon.in/Apple-iPhone-11-64GB-Green/dp/B07XVKBY68/ref=sr_1_7?keywords=iphone+11&qid=1573668357&sr=8-7",
        "https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19", 
        "https://www.amazon.it/Apple-iPhone-11-128GB-Verde/dp/B07XS5MSW4/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNlhGMElFNUhJMTBJJmVuY3J5cHRlZElkPUEwMTI2OTMxMVpXWEtHQ1o5S0ZENCZlbmNyeXB0ZWRBZElkPUEwOTMyMTczMVdMMzlQOTRPTUE3SCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU=" ]

for site in sites:
    get_price(site)
    print("\n")

Prints:

https://www.amazon.in/Apple-iPhone-11-64GB-Green/dp/B07XVKBY68/ref=sr_1_7?keywords=iphone+11&qid=1573668357&sr=8-7
₹ 64,499.00


https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19
744,89 €


https://www.amazon.it/Apple-iPhone-11-128GB-Verde/dp/B07XS5MSW4/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNlhGMElFNUhJMTBJJmVuY3J5cHRlZElkPUEwMTI2OTMxMVpXWEtHQ1o5S0ZENCZlbmNyeXB0ZWRBZElkPUEwOTMyMTczMVdMMzlQOTRPTUE3SCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU=
749,00 €
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91