How to obtain stock market company sector from ticker or company name in python

Question

Given a company ticker or name I would like to get its sector using python.

I have tried already several potential solutions but none has worked succesfully

The two most promising are:

1) Using the script from: https://gist.github.com/pratapvardhan/9b57634d57f21cf3874c

from urllib import urlopen
from lxml.html import parse

'''
Returns a tuple (Sector, Indistry)
Usage: GFinSectorIndustry('IBM')
'''
def GFinSectorIndustry(name):
  tree = parse(urlopen('http://www.google.com/finance?&q='+name))
  return tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

However I am using python --version 3.8

I have been able to tweak this solution, but the last line is not working and I am completely new to scraping web pages, so I would appreciate if anyone has some suggestions.

Here is my current code:

from urllib.request import Request, urlopen
from lxml.html import parse

name="IBM"
req = Request('http://www.google.com/finance?&q='+name, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)

tree = parse(webpage)

But then the last part is not working and I am very new to this xpath syntax:

tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

2) The other option was embedding R's TTN package as shown here: Find which sector a stock belongs to

However, I want to run it within my Jupyter notebook, and it is just taking ages to run ss <- stockSymbols()

There is no `a` tag whose `id` is *sector* on, say, [www.google.com/search?q=MSFT](https://www.google.com/search?q=MSFT). Do you have a concret example of what you really want to get ? If so, enrich your question with such precision. — keepAlive, Jun 16 '20 at 22:36
@keepAlive, for example from here: https://www.marketwatch.com/investing/stock/ibm I want to get the *sector*, which is: *Business/Consumer Services*. This is shown on the left of the graph, below the stock prize. — alejandro, Jun 16 '20 at 22:39

keepAlive · Answer 1 · 2020-06-16T23:21:40.557

1

Following your comment, for marketwatch.com/investing/stock specifically, the xpath that is likely to work is "//div[@class='intraday__sector']/span[@class='label']" meaning that doing

tree.xpath("//div[@class='intraday__sector']/span[@class='label']")[0].text

should return the desired information.

I am completely new to scraping web pages [...]

Some precisions:

This xpath totally depends on the website you are looking at, explaining why there were no hope in searching "//a[@id='sector']" in the page you mention in comments, since this xpath (now outdated) was google-finance specific. Put differently, you first need to "study" the page you are interested in to know where the information you want is located.
To conduct such "study" I use Chrome DevTools and check any xpath in the console, doing $x(<your-xpath-of-interest>) where the function $x is documented here (with examples!).
Luckily for you, the information you want to get from marketwatch.com/investing/stock -- the sector's name -- is statically generated _{(i.e. not dynamically generated at page loading, in which case other scraping techniques would have been required, resorting to other python libraries such as Selenium.. but this is another question).}

edited Jun 16 '20 at 23:21

answered Jun 16 '20 at 22:54

keepAlive

6,369
5
24
39

This `tree.xpath("//div[@class='intraday__sector']/span[@class='label']")` gives an empty `list`. From the `tree` variable generated with the code of the question using the google website. – alejandro Jun 16 '20 at 23:07
1

@alejandro A xpath that works for a given website, won't work for another website. This one works for [marketwatch.com/investing/stock](https://www.marketwatch.com/investing/stock/ibm). See update. – keepAlive Jun 16 '20 at 23:24
yeah I figured, but when I tried to do it for that website I get the following error: `HTTPError: HTTP Error 405: Method Not Allowed`. Which seems to be related to the `urlopen` part of the code. – alejandro Jun 16 '20 at 23:30
1

@al note that 405 error is independant from the core subject of your post. That being said, have you seen [that](https://stackoverflow.com/q/47951270/4194079) ? – keepAlive Jun 16 '20 at 23:39
1

I was indeed looking at it ;) OK, I will keep troubleshooting then! Thanks, for your nice response so far, once I am done with this I'll accept if all works correctly. – alejandro Jun 16 '20 at 23:42

oo7knutson · Answer 2 · 2020-11-30T12:57:03.893

You can easily obtain the sector for any given company/ticker with yahoo finance:

import yfinance as yf

tickerdata = yf.Ticker('TSLA') #the tickersymbol for Tesla
print (tickerdata.info['sector'])

Code returns: 'Consumer Cyclical'

If you want other information about the company/ticker, just print(tickerdata.info) to see all other possible dictionary keys and corresponding values, like ['sector'] used in the code above.

score 0 · Answer 3 · answered Jun 17 '20 at 22:58

To answer the question:

How to obtain stock market company sector from ticker or company name in python?

I had to find a work around after reading some material and some nice suggestions from @keepAlive.

The following does the job in a reverse way, i.e. gets the companies given the sector. There are 10 sectors, so it is not too much work if one wants info for all sectors: https://www.stockmonitor.com/sectors/

Given that marketwatch.com/investing/stock was throwing a 405 Error, I decided to use https://www.stockmonitor.com/sectors/, for example:

https://www.stockmonitor.com/sector/healthcare/

Here is the code:

import requests

import pandas as pd

from lxml.html import parse
from urllib.request import Request, urlopen

headers = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)" + " "
    "AppleWebKit/537.36 (KHTML, like Gecko)" + " " + "Chrome/35.0.1916.47" +
    " " + "Safari/537.36"
]

url = 'https://www.stockmonitor.com/sector/healthcare/'

headers_dict = {'User-Agent': headers[0]}
req = Request(url, headers=headers_dict)
webpage = urlopen(req)

tree = parse(webpage)

healthcare_tickers = []
for element in tree.xpath("//tbody/tr/td[@class='text-left']/a"):

    healthcare_tickers.append(element.text)

pd.Series(healthcare_tickers)

Thus, healthcare_tickers has the stock companies in the healthcare sector.

How to obtain stock market company sector from ticker or company name in python

3 Answers3