0

I read the Oreilly book about python scraping the in chapter 3, page 41 the author using regular expression to take all link begin with "/" .She wrote:

for link in bsObj.findAll("a", href=re.compile("^(/|.*"+includeUrl+")")):
        if link.attrs['href'] is not None:
            if link.attrs['href'] not in internalLinks:
                if(link.attrs['href'].startswith("/")):
                    internalLinks.append(includeUrl+link.attrs['href'])
                else:
                    internalLinks.append(link.attrs['href'])
    return internalLinks

I dont know why "begin with /" write like this? Because in regex before start a special symbol like "/" we will write like "/" and what the "|" (or) in it mean? Please help me to explain it. Thank alot!!!

AlphaWolf
  • 395
  • 2
  • 7
  • 16

0 Answers0