I am trying to find if a link contains ".pdf" at its end.
I am skipping all the characters before ".pdf" using [/w/-]+
in regular expression and then seeing if it contains ".pdf". I am new to regular expressions.
The code is:
import urllib2
import json
import re
from bs4 import BeautifulSoup
url = "http://codex.cs.yale.edu/avi/os-book/OS8/os8c/slide-dir/"
response = urllib2.urlopen(url)
soup = BeautifulSoup(response.read())
links = soup.find_all('a')
for link in links:
name = link.get("href")
if(re.match(r'[\w/.-]+.pdf',name)):
print name
I want to match name with following type of links:
PDF-dir/ch1.pdf