1

I am trying to extract the ticker symbols from this website: https://www.capitoltrades.com/trades?txType=buy&tradeSize=4&tradeSize=5&tradeSize=6&tradeSize=7&tradeSize=8&tradeSize=9&tradeSize=10

but for some reason i cant do it with the code below. I use this exact code on other javascript enabled websites all the time with no problem so I'm not sure whats happening here.

Anyone know how to fix this?

from requests_html import HTMLSession

url = 'https://www.capitoltrades.com/trades?txType=buy&tradeSize=4&tradeSize=5&tradeSize=6&tradeSize=7&tradeSize=8&tradeSize=9&tradeSize=10'

s = HTMLSession()
r = s.get(url)

r.html.render(sleep=1)

products = r.html.find('span.q-field issuer-ticker')

for product in products:
    print(product.text)
Joey
  • 11
  • 3
  • Sorry, I misread the question - I assumed it was yet another BeautifulSoup question, but it seems you are using a library that does, in fact, invoke a JavaScript engine and see what results. The next obvious question is - where you have `sleep=1`, maybe that isn't enough time? Did you try to check what's in the `r.html` after rendering, manually? – Karl Knechtel Apr 02 '23 at 06:53
  • I think most of the part is covered by Karl nicely. I would wait for a few seconds and then add an if condition to make sure the last element of the HTML page is loaded before 'span.q-field issuer-ticker' is guaranteed to load. – Suchandra T Apr 02 '23 at 06:58

1 Answers1

1

Perhaps you can try to use their Ajax API to load the table data. Example:

import requests
import pandas as pd


url = 'https://bff.capitoltrades.com/trades'

params = {
            "txType": "buy",
            "tradeSize": [
                "4",
                "5",
                "6",
                "7",
                "8",
                "9",
                "10"
            ],
            "page": "1",
            "pageSize": "12"
        }

all_data = []
for params['page'] in range(1, 4):  # <-- increase number of pages here
    all_data.extend(requests.get(url, params=params).json()['data'])

df = pd.DataFrame(all_data)
df = pd.concat([df, df.pop('asset').apply(pd.Series).add_prefix('asset_')], axis=1)
df = pd.concat([df, df.pop('issuer').apply(pd.Series).add_prefix('issuer_')], axis=1)
df = pd.concat([df, df.pop('politician').apply(pd.Series).add_prefix('politician_')], axis=1)

print(df)

Prints:

_txId _politicianId _assetId _issuerId pubDate filingDate txDate txType txTypeExtended hasCapitalGains owner chamber price size sizeRangeHigh sizeRangeLow value filingId filingURL reportingGap comment committees labels asset_assetType asset_assetTicker asset_instrument issuer__stateId issuer_c2iq issuer_country issuer_issuerName issuer_issuerTicker issuer_sector issuer_lastEOD politician__stateId politician_chamber politician_dob politician_firstName politician_gender politician_lastName politician_nickname politician_party
20003761503 P000608 100012044 435544 2023-03-29T13:05:01Z 2023-03-28 2023-02-07 buy False spouse house nan nan nan nan 175000 204584522 https://disclosures-clerk.house.gov/public_disc/ptr-pdfs/2023/20022633.pdf 49 [] [] municipal-security US TREASURY BILLS nan ca house 1958-06-17 Scott male Peters democrat
10000060750 S001217 100005792 430468 2023-03-23T17:15:11Z 2023-03-23 2023-02-24 buy False spouse senate nan nan nan nan 750000 100114523 https://efdsearch.senate.gov/search/view/ptr/83de647b-ddf0-49c3-bd56-8b32f23c0e78/ 27 Rate/Coupon: 5.0% Matures: 01/01/2040 [] [] municipal-security CENTRAL TEXAS REGIONAL MOBILITY AUTHORITY nan fl senate 1952-12-01 Richard male Scott Rick republican
20003761347 M001157 100006340 430955 2023-03-23T13:05:01Z 2023-03-20 2023-02-13 buy False spouse house 112.31 1559 2226 891 175000 204572563 https://disclosures-clerk.house.gov/public_disc/ptr-pdfs/2023/8219432.pdf 35 [] [] stock COP:US tx A2QVU1B8 us Conocophillips COP:US energy ['2022-04-01', 100.58] tx house 1962-01-14 Michael male McCaul republican
20003761348 M001157 100010442 434294 2023-03-23T13:05:01Z 2023-03-20 2023-02-09 buy False spouse house nan nan nan nan 175000 204572563 https://disclosures-clerk.house.gov/public_disc/ptr-pdfs/2023/8219432.pdf 39 [] [] municipal-security RACINE UNIFIED SCHOOL DISTRICT nan tx house 1962-01-14 Michael male McCaul republican
20003761349 M001157 100006594 431178 2023-03-23T13:05:01Z 2023-03-20 2023-02-02 buy False spouse house nan nan nan nan 175000 204572563 https://disclosures-clerk.house.gov/public_disc/ptr-pdfs/2023/8219432.pdf 46 [] [] municipal-security CITIES OF DALLAS AND FORT WORTH TEXAS nan tx house 1962-01-14 Michael male McCaul republican
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91