I am trying to scrape a series of websites that look like the following three examples:
www.examplescraper.com/fghxbvn/17901234.html
www.examplescraper.com/fghxbvn/17911102.html
www.examplescraper.com/fghxbvn/17921823.html
Please, keep in mind that there are 200 of these websites and I'd like to iterate through a loop rather than copying and pasting each website into a script.
Where the base is www.examplescraper.com/fghxbvn/
, then there's a year, followed by four digits that do not follow a pattern and then .html
.
So in the first website:
base = www.examplescraper.com/fghxbvn/
year = 1790
four random digits = 1234.html
I would like to call (in beautiful soup) a url where url:
url = base + str(year) + str(any four ints) + ".html"
My question:
How do I (in Python) recognize any four digits? They can be any digits. I don't need to generate four ints or return the four ints I just need Python to accept any four ints to feed into beautiful soup.