Given a company ticker or name I would like to get its sector using python.
I have tried already several potential solutions but none has worked succesfully
The two most promising are:
1) Using the script from: https://gist.github.com/pratapvardhan/9b57634d57f21cf3874c
from urllib import urlopen
from lxml.html import parse
'''
Returns a tuple (Sector, Indistry)
Usage: GFinSectorIndustry('IBM')
'''
def GFinSectorIndustry(name):
tree = parse(urlopen('http://www.google.com/finance?&q='+name))
return tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text
However I am using python --version 3.8
I have been able to tweak this solution, but the last line is not working and I am completely new to scraping web pages, so I would appreciate if anyone has some suggestions.
Here is my current code:
from urllib.request import Request, urlopen
from lxml.html import parse
name="IBM"
req = Request('http://www.google.com/finance?&q='+name, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)
tree = parse(webpage)
But then the last part is not working and I am very new to this xpath
syntax:
tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text
2) The other option was embedding R
's TTN
package as shown here: Find which sector a stock belongs to
However, I want to run it within my Jupyter notebook, and it is just taking ages to run ss <- stockSymbols()