I've written a script in python in combnation with re
module to get the title of different questions from a webpage. My intention here is not to use BeautifulSoup
and still to be able to parse the titles. The way I've used a pattern can do it. However, the output doesn't look so nice. How can I get only the question titles and nothing else.
Here is my try (using re.search()
):
import requests
import re
link = "https://stackoverflow.com/questions/tagged/web-scraping"
res = requests.get(link).text
for item in res.splitlines():
matchitem = re.search(r'hyperlink">(How.+)</a>',item)
if matchitem:
print(matchitem.group())
Output I'm getting like (out of several):
hyperlink">How to use Selenium check the checkbox lists?</a>
What I wish to get is like:
How to use Selenium check the checkbox lists?
I'm very new to regex. So, I seek apology in advance, If my question doesn't fit to be a question.