0

I used python 3 and pandas to parse the daily close from WSJ into EXCEL. However, the daily close shown on the web page screen cannot be extracted. Here is the link: "https://quotes.wsj.com/index/COMP/historical-prices" How to download the close data on screen into excel? and how to download "DOWNLOAD A SPREADSHEET" button file into excel with another name like comp.xlxs ?

Here are the codes:

import requests
import pandas as pd

url = 'https://quotes.wsj.com/index/COMP/historical-prices'

jsonData = requests.get(url).json()

final_df = pd.DataFrame()
for row in jsonData['data']:
    #row = jsonData['data'][1]

    data_row = []
    for idx, colspan in enumerate(row['colspan']):
        colspan_int = int(colspan[0])
        data_row.append(row['td'][idx] * colspan_int)
        flat_list = [item for sublist in data_row for item in sublist]
    temp_row = pd.DataFrame([flat_list])
    final_df = final_df.append(temp_row, sort=True).reset_index(drop=True)


wait2 = input("PRESS ENTER TO CONTINUE.")

Follow UP question quotes:

#
url = 'https://quotes.wsj.com/index/HK/XHKG/HSI/historical-prices/download?num_rows=15&range_days=15&endDate=12/06/2019'
response = requests.get(url)
open('HSI.csv', 'wb').write(response.content)
read_file = pd.read_csv (r'C:\A-CEO\REPORTS\STOCKS\PROFILE\Python\HSI.csv')
read_file.to_excel (r'C:\A-CEO\REPORTS\STOCKS\PROFILE\Python\HSI.xlsx', index = None, header=True)

#
url = 'https://quotes.wsj.com/index/SPX/historical-prices/download?num_rows=15&range_days=15&endDate=12/06/2019'
response = requests.get(url)
open('SPX.csv', 'wb').write(response.content)
read_file = pd.read_csv (r'C:\A-CEO\REPORTS\STOCKS\PROFILE\Python\SPX.csv')
read_file.to_excel (r'C:\A-CEO\REPORTS\STOCKS\PROFILE\Python\SPX.xlsx', index = None, header=True)


#
url = 'https://quotes.wsj.com/index/COMP/historical-prices/download?num_rows=15&range_days=15&endDate=12/06/2019'
response = requests.get(url)
open('COMP.csv', 'wb').write(response.content)
read_file = pd.read_csv (r'C:\A-CEO\REPORTS\STOCKS\PROFILE\Python\COMP.csv')
read_file.to_excel (r'C:\A-CEO\REPORTS\STOCKS\PROFILE\Python\COMP.xlsx', index = None, header=True)
Arthur Law
  • 111
  • 7
  • If you want to write a script to download the spreadsheet from the button you can use Selenium webdriver. [This post](https://stackoverflow.com/questions/18439851/how-can-i-download-a-file-on-a-click-event-using-selenium) would be a good start. – Joseph Rajchwald Nov 06 '19 at 15:47

1 Answers1

0

the URL is wrong; once downloaded you can do "Get Info" if on a Mac, and you'll see "Where From:". You will see it's of the form below.

import requests
import pandas as pd
import io

#original URL had a bunch of other parameters I omitted, only these seem to matter but YMMV
url = 'https://quotes.wsj.com/index/COMP/historical-prices/download?num_rows=360&range_days=360&endDate=11/06/2019'

response = requests.get(url)

#do this if you want the CSV written to your machine
open('test_file.csv', 'wb').write(response.content)

# this decodes the content of the downloaded response and presents it to pandas
df_test = pd.read_csv(io.StringIO(response.content.decode('utf-8')))

To answer your additional question -- you can simply loop across a list of tickers or symbols, something like:

base_url = 'https://quotes.wsj.com/index/{ticker_name}/historical-prices/download?num_rows=360&range_days=360&endDate=11/06/2019'


ticker_list = ['COMP','SPX','HK/XHKG/HSI']

for ticker in ticker_list:
    response = requests.get(base_url.format(ticker_name = ticker))
    #do this if you want the CSV written to your machine
    open('prices_'+ticker.replace('/','-')+'.csv', 'wb').write(response.content)

Note for HK/XHKG/HSI, we need to replace the slashes with hyphens or it's not a valid filename. You can also use this pattern to make dataframes.

wij
  • 1,304
  • 1
  • 8
  • 9
  • Thank you for your advice. I tried to down download multiple indices. How to do it with loop and save in respective file names. The name is part of the path like "COMP", "SPX", "HSI" ... Please see the "Follow UP question quotes which I put in the original question part. – Arthur Law Nov 07 '19 at 14:56
  • See above, basically use a for loop. – wij Nov 09 '19 at 06:26