-1

I'm making a simple scraping algo to pull the gtin of products. I'm able to scrape the html and pinpoint the gtin number but am wondering what the best way to scrape this into an integer number. More over, how do I scrape something like content= and grab its assigned number?

import requests
from bs4 import BeautifulSoup

testing_link = 'https://www.walmart.com/ip/Better-Homes-Gardens-Leighton-Nightstand-Rustic-Cherry-Finish/54445647'

URL = testing_link
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find(itemprop='gtin13')

print(results.prettify())

When I run this, I get

<span content="0042666029322" itemprop="gtin13"></span>

I want to be able to get 0042666029322 as an integer to use for later, any advice?

Daniel N
  • 13
  • 4

1 Answers1

0

You can't do that, a leading zero is meant for octal in Python. You can save as string and convert it to int afterwards.

>>> content = results.get('content')
>>> print(content)
0042666029322
>>> print(int(content))
42666029322
Camilo
  • 173
  • 6
  • thanks for the help! What if I wanted to have leading zeros in the front? Is there a way to include that as an int? – Daniel N Sep 15 '20 at 23:48
  • Why do you want to do that? You can't force a data type to have leading zeros, if you really want to do that then you'll have to come out with your own data type (based on strings or lists to store any leading zeros) – Camilo Sep 16 '20 at 04:54