I need to scrape data from the this URL:
The pre-2018
data is easy to scrape. However, the data after 2018.12.18
is dynamic using javascript
. I don't know how to scrape. Can anyone help me? I want to do it using python
.
I need to scrape data from the this URL:
The pre-2018
data is easy to scrape. However, the data after 2018.12.18
is dynamic using javascript
. I don't know how to scrape. Can anyone help me? I want to do it using python
.
To load the CSV for selected data, you can use this script:
import requests
import pandas as pd
from io import StringIO
url = 'https://api.finra.org/data/group/otcMarket/name/otcDailyList'
data = {
"offset": 0,
"compareFilters": [
{
"fieldName": "calendarDay",
"fieldValue": "2019-12-03", # <--- Change to date you need
"compareType": "EQUAL"
}
],
"delimiter": "|",
"limit": 5000,
"quoteValues": False,
"fields": [
"dailyListDatetime",
"dailyListReasonDescription",
"newSymbolCode",
"oldSymbolCode",
"newSecurityDescription",
"oldSecurityDescription",
"exDate",
"commentText",
"newMarketCategoryCode",
"oldMarketCategoryCode",
"newOATSReportableFlag",
"oldOATSReportableFlag",
"newRoundLotQuantity",
"oldRoundLotQuantity",
"newRegFeeFlag",
"oldRegFeeFlag",
"newClassText",
"oldClassText",
"newFinancialStatusCode",
"oldFinancialStatusCode",
"subjectCorporateActionCode",
"newADROrdnyShareRate",
"oldADROrdinaryShareRate",
"newMaturityExpirationDate",
"oldMaturityExpirationDate",
"offeringTypeDescription",
"forwardSplitRate",
"reverseSplitRate",
"dividendTypeCode",
"stockPercentage",
"cashAmountText",
"declarationDate",
"recordDate",
"paymentDate",
"paymentMethodCode",
"ADRFeeAmount",
"ADRTaxReliefAmount",
"ADRGrossRate",
"ADRNetRate",
"ADRIssuanceFeeAmount",
"ADRWitholdingTaxPercentage",
"qualifiedDividendDescription"
],
"sortFields": [
"-dailyListDatetime"
]
}
pd.set_option('display.width', 200)
pd.set_option('display.max_columns', 8)
df = pd.read_csv(StringIO(requests.post(url, json=data).text), delimiter='|')
print(df)
Prints:
dailyListDatetime dailyListReasonDescription newSymbolCode oldSymbolCode ... ADRNetRate ADRIssuanceFeeAmount ADRWitholdingTaxPercentage qualifiedDividendDescription
0 2019-12-03 18:33:52.0 Addition NaN WDTRF ... NaN NaN NaN NaN
1 2019-12-03 17:03:47.0 Cash Dividend Regular TTDKY TTDKY ... 0.679513 NaN 15.0 NaN
2 2019-12-03 16:53:45.0 Cash Dividend Regular BNDSY BNDSY ... 0.031600 NaN 19.0 NaN
3 2019-12-03 16:48:12.0 Cash Dividend Regular FUJHY FUJHY ... 0.244223 NaN 15.0 NaN
4 2019-12-03 16:46:41.0 Cash Dividend Regular NNCHY NNCHY ... 0.279709 NaN 15.0 NaN
.. ... ... ... ... ... ... ... ... ...
66 2019-12-03 00:00:00.0 Reverse Split/CUSIP Change ERRCD ERRCF ... NaN NaN NaN NaN
67 2019-12-03 00:00:00.0 Subject to Corporate Action Flag Removed MCLDF MCLDF ... NaN NaN NaN NaN
68 2019-12-03 00:00:00.0 Name/CUSIP Change PEMTF PEMTF ... NaN NaN NaN NaN
69 2019-12-03 00:00:00.0 Subject to Corporate Action Flag Removed SATIF SATIF ... NaN NaN NaN NaN
70 2019-12-03 00:00:00.0 Symbol Change ITXXF RVLYF ... NaN NaN NaN NaN
[71 rows x 42 columns]