I have read a url with this command:
import urllib2
from bs4 import BeautifulSoup
req = urllib2.Request(url, headers=hdr)
req2 = urllib2.urlopen(req)
content = req2.read()
soup = BeautifulSoup(content, "lxml")
I want to scrape a website with structure like below:
<div class='\"companyNameWrapper\"'>
\r\n
<div class='\"companyName\"'>
ACP Holding Deutschland GmbH
</div>
\r\n
problem is because of slashes, commands like
soup.findAll("div", {"class":"companyName"}):
does not work. I need to convert soup to str to use .replace('\', ''), but then the type is string and soup.findAll (and similar bs4 commands are not valid).
Does anyone has suggestion?
Thanks