-2

I need to scrape data from the this URL:

The pre-2018 data is easy to scrape. However, the data after 2018.12.18 is dynamic using javascript. I don't know how to scrape. Can anyone help me? I want to do it using python.

Ole Pannier
  • 3,208
  • 9
  • 22
  • 33
clement
  • 97
  • 3
  • 1
    You need a full-fledged headless browser like PhantomJS or Puppeteer, not a simple HTTP GET request. – Jeremy Thille Sep 08 '20 at 15:06
  • Duplicate of [Get page generated with Javascript in Python](https://stackoverflow.com/questions/8960288/get-page-generated-with-javascript-in-python) – esqew Sep 08 '20 at 15:06

1 Answers1

0

To load the CSV for selected data, you can use this script:

import requests
import pandas as pd
from io import StringIO


url = 'https://api.finra.org/data/group/otcMarket/name/otcDailyList'

data = {
  "offset": 0,
  "compareFilters": [
    {
      "fieldName": "calendarDay",
      "fieldValue": "2019-12-03",       # <--- Change to date you need
      "compareType": "EQUAL"
    }
  ],
  "delimiter": "|",
  "limit": 5000,
  "quoteValues": False,
  "fields": [
    "dailyListDatetime",
    "dailyListReasonDescription",
    "newSymbolCode",
    "oldSymbolCode",
    "newSecurityDescription",
    "oldSecurityDescription",
    "exDate",
    "commentText",
    "newMarketCategoryCode",
    "oldMarketCategoryCode",
    "newOATSReportableFlag",
    "oldOATSReportableFlag",
    "newRoundLotQuantity",
    "oldRoundLotQuantity",
    "newRegFeeFlag",
    "oldRegFeeFlag",
    "newClassText",
    "oldClassText",
    "newFinancialStatusCode",
    "oldFinancialStatusCode",
    "subjectCorporateActionCode",
    "newADROrdnyShareRate",
    "oldADROrdinaryShareRate",
    "newMaturityExpirationDate",
    "oldMaturityExpirationDate",
    "offeringTypeDescription",
    "forwardSplitRate",
    "reverseSplitRate",
    "dividendTypeCode",
    "stockPercentage",
    "cashAmountText",
    "declarationDate",
    "recordDate",
    "paymentDate",
    "paymentMethodCode",
    "ADRFeeAmount",
    "ADRTaxReliefAmount",
    "ADRGrossRate",
    "ADRNetRate",
    "ADRIssuanceFeeAmount",
    "ADRWitholdingTaxPercentage",
    "qualifiedDividendDescription"
  ],
  "sortFields": [
    "-dailyListDatetime"
  ]
}

pd.set_option('display.width', 200)
pd.set_option('display.max_columns', 8)

df = pd.read_csv(StringIO(requests.post(url, json=data).text), delimiter='|')
print(df)

Prints:

        dailyListDatetime                dailyListReasonDescription newSymbolCode oldSymbolCode  ... ADRNetRate ADRIssuanceFeeAmount ADRWitholdingTaxPercentage qualifiedDividendDescription
0   2019-12-03 18:33:52.0                                  Addition           NaN         WDTRF  ...        NaN                  NaN                        NaN                          NaN
1   2019-12-03 17:03:47.0                     Cash Dividend Regular         TTDKY         TTDKY  ...   0.679513                  NaN                       15.0                          NaN
2   2019-12-03 16:53:45.0                     Cash Dividend Regular         BNDSY         BNDSY  ...   0.031600                  NaN                       19.0                          NaN
3   2019-12-03 16:48:12.0                     Cash Dividend Regular         FUJHY         FUJHY  ...   0.244223                  NaN                       15.0                          NaN
4   2019-12-03 16:46:41.0                     Cash Dividend Regular         NNCHY         NNCHY  ...   0.279709                  NaN                       15.0                          NaN
..                    ...                                       ...           ...           ...  ...        ...                  ...                        ...                          ...
66  2019-12-03 00:00:00.0                Reverse Split/CUSIP Change         ERRCD         ERRCF  ...        NaN                  NaN                        NaN                          NaN
67  2019-12-03 00:00:00.0  Subject to Corporate Action Flag Removed         MCLDF         MCLDF  ...        NaN                  NaN                        NaN                          NaN
68  2019-12-03 00:00:00.0                         Name/CUSIP Change         PEMTF         PEMTF  ...        NaN                  NaN                        NaN                          NaN
69  2019-12-03 00:00:00.0  Subject to Corporate Action Flag Removed         SATIF         SATIF  ...        NaN                  NaN                        NaN                          NaN
70  2019-12-03 00:00:00.0                             Symbol Change         ITXXF         RVLYF  ...        NaN                  NaN                        NaN                          NaN

[71 rows x 42 columns]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91