I am trying to extract all hotel names for a given country from the following side: https://www.holidaycheck.de/dh/hotels-tunesien/e10cef63-45d4-3511-92f1-43df5cbd9fe1. Given that the data is split across several pages I am trying to set up a loop - Unfortunately I dont manage to extract the pager number of pages(highest page number) from the htlm to tell my loop where to stop. (I know this question has been frequently asked an answered and I read through all the post, but non seems to solve my problem)
The html code looks like this:
<div class="main-nav-items">
<span class="prev-next"
<span>
<i class="prev-arrow icon icon-left-arrow-line"></i>
<span>previous</span>
</span>
</a>
</span>
<span class="other-page">
<a class="link" href="/dh/hotels-tunesien/e10cef63-45d4-3511-92f1-43df5cbd9fe1">66</a>
What I need is the number right after the href from the last line of code (in the given case 66)
I tried it with:
data = soup.find_all('a', {'class':'link'})
y=str(data)
x=re.findall("[0-9]+",y)
print(x)
But this code gives me also the numbers from the href such as 45 and 3511
Additionally I tried:
data = soup.find_all('a', {'class':'link'})
numbers=([d.text for d in data])
print(numbers)
This worked well besides that also next and previous are included and that I didnt manage to convert the output into integers from which i possibly could extract the max and drop previous and next
Besides I tried it with a "while" as explained here: scraping data from unknown number of pages using beautiful soup But somehow this did not return all hotels and skipped pages...
I would highly appreciate if someone could give me some advice on how to fix my problem. Thank you!