When inspecting the soup in pdb (break point before your for loop) I found:
(Pdb++) p soup
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n\n<html><head>\n<title>410
Gone</title>\n</head><body>\n<h1>Gone</h1>\n<p>The requested
resource<br/>/farming<br/>\nis no longer available on this server and there is
no forwarding address.\nPlease remove all references to this resource.
</p>\n</body></html>\n
This probably means there is some anti-scraping measure in place! The site has detected that you're trying to scrape using python, and sent you to a page where you couldn't get any data.
In the future, I recommend using pdb
to inspect the code, or perhaps printing out the Soup when you run into an issue! This can help clear up what happened, and show you what tags are available
EDIT:
Although I wouldn't necessarily recommend it (scraping is against donedeal's terms of service) there is a way to get around this.
If you feel like living on the wild side, you can make the requests
module HTTP request look like it's coming from a real user, not a script. You can do this using the following:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
def donedeal(max_pages):
for i in range(1, max_pages+1):
page = (i - 1) * 28
url = 'https://www.donedeal.ie/farming?sort=publishdate%20desc&start={}'.format(page) # http:/?...
source_code = requests.get(url, headers=headers)
plain_text = source_code.content
soup = BeautifulSoup(plain_text, "html.parser")
for title in soup("p", {"class": "card__body-title"}):
x = title.text
print(x)
donedeal(1)
All I did was tell the requests
module to use the headers provided in headers
. This makes the request look like it was coming from a Mac using Firefox.
I tested this and it seemed like it printed out the titles you want, no 410 error! :)
See this answer for more