0

I am trying to programmatically fetch the list of companies included in NASDAQ-100. I tried scraping Nasdaq-100-Index-Components using Beautiful Soup - bs4, but so far without much success.

How can I get this list (tickers and company names)?

s = requests.Session()
s.headers.update(
    {
        "Accept-Language":"en-US,en;q=0.9", 
    "Accept-Encoding":"gzip, deflate, br",
    "User-Agent":"Java-http-client/"
    }
)
r = s.get("https://www.nasdaq.com/market-activity/quotes/nasdaq-ndx-index")
soup = BeautifulSoup(r.content, "html.parser")
res = json.loads([x for x in soup.find("script", {"type": "application/json"})][0])

This only returns a very limited list and I suspect that this naive scraping doesn't really get all the data.

Asclepius
  • 57,944
  • 17
  • 167
  • 143
Dror
  • 12,174
  • 21
  • 90
  • 160
  • The information you're looking for isn't visible in the HTML. The page is dynamically fed with Javascript. Open that page in your favourite browser and look at the page source and you'll see what I mean. There's probably an API out there somewhere that will get you this information but it may not be free –  Jul 22 '21 at 09:26
  • I might be ready to pay for an API but I cannot find it. Can you point to the line which actually loads the data into the HTML page? I failed to find it. – Dror Jul 22 '21 at 09:28
  • To add to the previous comment, if you really need to fetch these tickers from this specific web page you can use selenium which can access the web page using a browser and from which you will be able to see the data fed with Javascript. But finding another source for your ticker will probably be simpler. – Erlinska Jul 22 '21 at 09:29

2 Answers2

4

As data is dynamic generated go to chrome developer mode to Network tab and find data by searching in box and refresh website now you can find link which content company list data as json data

import requests
headers={"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"}
res=requests.get("https://api.nasdaq.com/api/quote/list-type/nasdaq100",headers=headers)
main_data=res.json()['data']['data']['rows']


for i in range(len(main_data)):
    print(main_data[i]['companyName'])

Output:

Activision Blizzard, Inc. Common Stock
Adobe Inc. Common Stock
Advanced Micro Devices, Inc. Common Stock
Align Technology, Inc. Common Stock
..

Image

enter image description here

Bhavya Parikh
  • 3,304
  • 2
  • 9
  • 19
0

Using QQQ ETF

To obtain an official list of Nasdaq 100 symbols as constituents of the QQQ ETF:

def list_qqq_holdings() -> pd.DataFrame:
    # Ref: https://stackoverflow.com/a/75846060/
    # Source: https://www.invesco.com/us/financial-products/etfs/holdings?ticker=QQQ
    converters = {'Holding Ticker': lambda s: s.rstrip(' ')}
    return pd.read_csv(url, converters=converters, index_col='Holding Ticker')

Using Wikipedia

To obtain an unofficial list of Nasdaq 100 symbols, pandas.read_html can be used. A parser such as lxml or bs4+html5lib is also required as it is used internally by pandas.

Note that the list on Wikipedia can be outdated.

import pandas as pd

def list_wikipedia_nasdaq100() -> pd.DataFrame:
    # Ref: https://stackoverflow.com/a/75846060/
    url = 'https://en.m.wikipedia.org/wiki/Nasdaq-100'
    return pd.read_html(url, attrs={'id': "constituents"}, index_col='Ticker')[0]


>> df = list_wikipedia_nasdaq100()
>> df.head()
                    Company  ...                      GICS Sub-Industry
Ticker                       ...                                       
ATVI    Activision Blizzard  ...         Interactive Home Entertainment
ADBE             Adobe Inc.  ...                   Application Software
ADP                     ADP  ...  Data Processing & Outsourced Services
ABNB                 Airbnb  ...     Internet & Direct Marketing Retail
ALGN       Align Technology  ...                   Health Care Supplies
[5 rows x 3 columns]

>> symbols = df.index.to_list()
>> symbols[:5]
['ATVI', 'ADBE', 'ADP', 'ABNB', 'ALGN']

Using Slickcharts

import pandas as pd
import requests

def list_slickcharts_nasdaq100() -> pd.DataFrame:
    # Ref: https://stackoverflow.com/a/75846060/
    url = 'https://www.slickcharts.com/nasdaq100'
    user_agent = 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0'  # Default user-agent fails.
    response = requests.get(url, headers={'User-Agent': user_agent})
    return pd.read_html(response.text, match='Symbol', index_col='Symbol')[0]

These were tested with Pandas 1.5.3.

The results can be cached for a certain period of time, e.g. 8 hours, in memory and/or on disk, so as to avoid the risk of excessive repeated calls to the source.

A similar answer for the S&P 500 is here.

Asclepius
  • 57,944
  • 17
  • 167
  • 143