0

I'm running a program to pull some info from Yahoo! Finance. It runs fine as a For loop, however it takes a long time (about 10 minutes for 7,000 inputs) because it has to process each request.get(url) individually (or am I mistaken on the major bottlenecker?)

Anyway, I came across multithreading as a potential solution. This is what I have tried:

import requests
import pprint
import threading

with open('MFTop30MinusAFew.txt', 'r') as ins: #input file for tickers
    for line in ins:
        ticker_array = ins.read().splitlines()

ticker = ticker_array
url_array = []
url_data = []
data_array =[]

for i in ticker:
    url = 'https://query2.finance.yahoo.com/v10/finance/quoteSummary/'+i+'?formatted=true&crumb=8ldhetOu7RJ&lang=en-US&region=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com'
    url_array.append(url) #loading each complete url at one time 

def fetch_data(url):
    urlHandler = requests.get(url)
    data = urlHandler.json()
    data_array.append(data)

pprint.pprint(data_array)

threads = [threading.Thread(target=fetch_data, args=(url,)) for url in url_array]

for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

fetch_data(url_array)

The error I get is InvalidSchema: No connection adapters were found for '['https://query2.finance.... [url continues].

PS. I've also read that using multithread approach to scrape websites is bad/can get you blocked. Would Yahoo! Finance mind if I'm pulling data from a couple thousand tickers at once? Nothing happened when I did them sequentially.

Rafael
  • 3,096
  • 1
  • 23
  • 61
  • I will point out that a perfectly good [Python package already exists](https://pypi.python.org/pypi/yahoo-finance/1.1.4) on Pypi for making requests to Yahoo! finance. It won't help make more requests faster, but it'll be a lot nicer than needing to write your own logic for getting values. – Akshat Mahajan Sep 06 '16 at 23:33
  • I've seen this! But they don't have methods for all the numbers I need. – Rafael Sep 07 '16 at 01:27

1 Answers1

2

If you look carefully at the error you will notice that it doesn't show one url but all the urls you appended, enclosed with brackets. Indeed the last line of your code actually call your method fetch_data with the full array as a parameter, which does't make sense. If you remove this last line the code runs just fine, and your threads are called as expected.

Tanguy A.
  • 155
  • 2
  • 11