In my code I am generating many different URL addresses and pulling a specific table from each of these sites. Without using concurrent operations the process is very slow and I would like to optimize for speed.
from lxml import html
for eachTicker in ticker_list:
bs_url = 'http://finance.yahoo.com/q/bs?s=%s' % eachTicker
is_url = 'http://finance.yahoo.com/q/is?s=%s' % eachTicker
cf_url = 'http://finance.yahoo.com/q/cf?s=%s' % eachTicker
bs_tree = html.parse(bs_url)
is_tree = html.parse(is_url)
cf_tree = html.parse(cf_url)
cf_content = cf_tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr/td
bs_content = bs_tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr/td
is_content = is_tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr/td
I want to use Asynchronous I/O (Asyncio) to make this process much faster. Any ideas?
I'm currently playing around with the code below to see if I can get it to work. I'd like to put this in a for loop and run a list of URLs through it
import asyncio
import aiohttp
@asyncio.coroutine
def print_page(url):
response = yield from aiohttp.request('GET', url)
body = yield from response.read_and_close(decode=False)
print(body)
loop = asyncio.get_event_loop()
loop.run_until_complete(print_page('http://www.google.com/'))
loop.run_until_complete(asyncio.wait([print_page('http://www.finance.yahoo.com/q/cf?s=ABT'),
print_page('http://www.finance.yahoo.com/q/cf?s=MMM')]))