I am new to web scraping. I would like to do two things:
- How to convert the str into Pandas Dataframe?
- How to remove the Xa0 tag.
Here is the code:
import urllib.request as req
from bs4 import BeautifulSoup as bs
import requests
import bs4
url = 'https://finance.yahoo.com/quote/GILD/profile?p=GILD'
# create a request object, add request header information
request = req.Request(url, headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'
})
with req.urlopen(request) as response:
data = response.read()
root = bs4.BeautifulSoup(data, 'html.parser')
# print(root)
profile = root.find('div', attrs = {'class' : 'Mb(25px)'})
records = []
profile.text
And this is the result:
'333 Lakeside DriveFoster City, CA 94404United States650-574-`3000http://www.gilead.comSector(s):\xa0HealthcareIndustry:\xa0Drug Manufacturers—GeneralFull Time Employees:\xa011,800'`