0

I am new to web scraping. I would like to do two things:

  1. How to convert the str into Pandas Dataframe?
  2. How to remove the Xa0 tag.

Here is the code:

import urllib.request as req
from bs4 import BeautifulSoup as bs
import requests
import bs4

url = 'https://finance.yahoo.com/quote/GILD/profile?p=GILD'

# create a request object, add request header information
request = req.Request(url, headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'
})

with req.urlopen(request) as response:
    data = response.read()

root = bs4.BeautifulSoup(data, 'html.parser')
# print(root)

profile = root.find('div', attrs = {'class' : 'Mb(25px)'})

records = []

profile.text

And this is the result:

'333 Lakeside DriveFoster City, CA 94404United States650-574-`3000http://www.gilead.comSector(s):\xa0HealthcareIndustry:\xa0Drug Manufacturers—GeneralFull Time Employees:\xa011,800'`
  • 1
    Pleas provide the modules you are importing to run this code – asheets Sep 25 '20 at 23:22
  • Here is a detailed [StackOverflow post] (https://stackoverflow.com/questions/10993612/how-to-remove-xa0-from-string-in-python#:~:text=%5Cxa0%20is%20actually%20non%2Dbreaking,by%201%20to%204%20bytes.) that talks about getting rid of the '\xa0' tag. – SaaSy Monster Sep 26 '20 at 02:40

1 Answers1

-2

use this to obtein the html from you url the headers you can add later, you need to use requests.get use this fuction in python

import requests
from bs4 import BeautifulSoup
from requests.exceptions import HTTPError
from urllib3.exceptions import MaxRetryError

@author Diego Rojas
def _visit(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  
        if(response.status_code == 200):
            
            return = BeautifulSoup(response.text, 'html.parser')

    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')  # Python 3.6
    except MaxRetryError as Max_err:
        print(f'HTTP error MaxEntry ocurred: {Max_err}')  # Python 3.6
    except Exception as err:
        print(f'Other error occurred: {err}')  # Python 3.6

the question about the tag i dont understand what you want to do

  • It is really appreciated that you are trying to help, but it would be better to provide the person writing the question with a full answer. It would have been better to simply comment on their post saying that their intent with the tags was unclear, as our answer here doesn’t provide them with a full explanation. – StarbuckBarista Oct 06 '20 at 05:19