I am new to Python and was trying to create a script in Python that scrapes a website and return the text in a couple of links. For some reason I can not figure why this is not working and would like to learn why. My regular expression is:
> regex = re.compile(r'<a target="_blank" title=".+" href=".+.pdf">(.+)</a>')
Full code:
import requests, re
response = requests.get('websithere')
websiteDate = response.text
regex = re.compile(r'<a target="_blank" title=".+" href=".+.pdf">(.+)</a>')
mo = regex.findall(websiteDate)
print(mo)
I put the (.+) in a group thinking it would find any text listed in there. The 3 links it's scanning through are:
> <a target="_blank" title="Farm Business & Production Management
> Instructor" href="/uploadedpdfs/job-opportunities/Farm Business
> Production Mgt Instructor 8-17.pdf">Farm Business & Production
> Management Instructor</a>
>
> <a target="_blank" title="Paramedic Tech Adjunct Instructor Aide"
> href="/uploadedpdfs/job-opportunities/Paramedic Adjunct Instructor
> Aide.pdf">Paramedic Tech Adjunct Instructor Aide</a>
>
> <a target="_blank" title="Technology Support Specialist"
> href="/uploadedpdfs/job-opportunities/Technology Support
> Specialist.pdf">Technology Support Specialist</a>
Instead my result is only returning: 'Technology Support Specialist'
What am I doing wrong here? I'm just trying to return the text inside of the tag. I've tried playing around with it a bit and just can't get it to work. Any help would be appreciated.
Thanks!