1

Given a company ticker or name I would like to get its sector using python.

I have tried already several potential solutions but none has worked succesfully

The two most promising are:

1) Using the script from: https://gist.github.com/pratapvardhan/9b57634d57f21cf3874c

from urllib import urlopen
from lxml.html import parse

'''
Returns a tuple (Sector, Indistry)
Usage: GFinSectorIndustry('IBM')
'''
def GFinSectorIndustry(name):
  tree = parse(urlopen('http://www.google.com/finance?&q='+name))
  return tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

However I am using python --version 3.8

I have been able to tweak this solution, but the last line is not working and I am completely new to scraping web pages, so I would appreciate if anyone has some suggestions.

Here is my current code:

from urllib.request import Request, urlopen
from lxml.html import parse

name="IBM"
req = Request('http://www.google.com/finance?&q='+name, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)

tree = parse(webpage)

But then the last part is not working and I am very new to this xpath syntax:

tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

2) The other option was embedding R's TTN package as shown here: Find which sector a stock belongs to

However, I want to run it within my Jupyter notebook, and it is just taking ages to run ss <- stockSymbols()

alejandro
  • 521
  • 8
  • 18
  • 1
    There is no `a` tag whose `id` is *sector* on, say, [www.google.com/search?q=MSFT](https://www.google.com/search?q=MSFT). Do you have a concret example of what you really want to get ? If so, enrich your question with such precision. – keepAlive Jun 16 '20 at 22:36
  • @keepAlive, for example from here: https://www.marketwatch.com/investing/stock/ibm I want to get the *sector*, which is: *Business/Consumer Services*. This is shown on the left of the graph, below the stock prize. – alejandro Jun 16 '20 at 22:39

3 Answers3

1

Following your comment, for marketwatch.com/investing/stock specifically, the xpath that is likely to work is "//div[@class='intraday__sector']/span[@class='label']" meaning that doing

tree.xpath("//div[@class='intraday__sector']/span[@class='label']")[0].text

should return the desired information.

I am completely new to scraping web pages [...]

Some precisions:

  1. This xpath totally depends on the website you are looking at, explaining why there were no hope in searching "//a[@id='sector']" in the page you mention in comments, since this xpath (now outdated) was google-finance specific. Put differently, you first need to "study" the page you are interested in to know where the information you want is located.
  2. To conduct such "study" I use Chrome DevTools and check any xpath in the console, doing $x(<your-xpath-of-interest>) where the function $x is documented here (with examples!).
  3. Luckily for you, the information you want to get from marketwatch.com/investing/stock -- the sector's name -- is statically generated (i.e. not dynamically generated at page loading, in which case other scraping techniques would have been required, resorting to other python libraries such as Selenium.. but this is another question).
keepAlive
  • 6,369
  • 5
  • 24
  • 39
  • This `tree.xpath("//div[@class='intraday__sector']/span[@class='label']")` gives an empty `list`. From the `tree` variable generated with the code of the question using the google website. – alejandro Jun 16 '20 at 23:07
  • 1
    @alejandro A xpath that works for a given website, won't work for another website. This one works for [marketwatch.com/investing/stock](https://www.marketwatch.com/investing/stock/ibm). See update. – keepAlive Jun 16 '20 at 23:24
  • yeah I figured, but when I tried to do it for that website I get the following error: `HTTPError: HTTP Error 405: Method Not Allowed`. Which seems to be related to the `urlopen` part of the code. – alejandro Jun 16 '20 at 23:30
  • 1
    @al note that 405 error is independant from the core subject of your post. That being said, have you seen [that](https://stackoverflow.com/q/47951270/4194079) ? – keepAlive Jun 16 '20 at 23:39
  • 1
    I was indeed looking at it ;) OK, I will keep troubleshooting then! Thanks, for your nice response so far, once I am done with this I'll accept if all works correctly. – alejandro Jun 16 '20 at 23:42
1

You can easily obtain the sector for any given company/ticker with yahoo finance:

import yfinance as yf

tickerdata = yf.Ticker('TSLA') #the tickersymbol for Tesla
print (tickerdata.info['sector'])

Code returns: 'Consumer Cyclical'

If you want other information about the company/ticker, just print(tickerdata.info) to see all other possible dictionary keys and corresponding values, like ['sector'] used in the code above.

oo7knutson
  • 11
  • 3
0

To answer the question:

How to obtain stock market company sector from ticker or company name in python?

I had to find a work around after reading some material and some nice suggestions from @keepAlive.

The following does the job in a reverse way, i.e. gets the companies given the sector. There are 10 sectors, so it is not too much work if one wants info for all sectors: https://www.stockmonitor.com/sectors/

Given that marketwatch.com/investing/stock was throwing a 405 Error, I decided to use https://www.stockmonitor.com/sectors/, for example:

https://www.stockmonitor.com/sector/healthcare/

Here is the code:

import requests

import pandas as pd

from lxml.html import parse
from urllib.request import Request, urlopen

headers = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)" + " "
    "AppleWebKit/537.36 (KHTML, like Gecko)" + " " + "Chrome/35.0.1916.47" +
    " " + "Safari/537.36"
]

url = 'https://www.stockmonitor.com/sector/healthcare/'

headers_dict = {'User-Agent': headers[0]}
req = Request(url, headers=headers_dict)
webpage = urlopen(req)

tree = parse(webpage)
healthcare_tickers = []
for element in tree.xpath("//tbody/tr/td[@class='text-left']/a"):

    healthcare_tickers.append(element.text)

pd.Series(healthcare_tickers)

Thus, healthcare_tickers has the stock companies in the healthcare sector.

alejandro
  • 521
  • 8
  • 18