I can scrape the target values for a date-specific URL... How should I setup datetime and scraping to skip URLs that do not have the target table?? this is the code I have so far --
date = datetime.datetime.today()
url = "http://www.wsj.com/mdc/public/page/2_3022-mfsctrscan-moneyflow- 20161205.html?mod=mdc_pastcalendar"
I know I sub in {date} to the URL to get the date dynamic to work - supplied a static url in case URL is blank.
date_time = urlopen(url.format(date=date.strftime('%Y%m%d')))
address = url
print 'Retrieving information from: ' + address
print '\n'
soup = BeautifulSoup (requests.get(address).content, "lxml")
scraping proceeds as:
rows = soup.select('div#column0 table tr')[2:]
headers = ['name', 'last', 'chg', 'pct_chg',
'total_money_flow', 'total_tick_up', 'total_tick_down', 'total_up_down_ratio',
'block_money_flow', 'block_tick_up', 'block_tick_down', 'block_up_down_ratio']
for row in rows:
# skip non-data rows
if row.find("td", class_="b14") is True:
continue
print(dict(zip(headers, [cell.get_text(strip=True) for cell in row.find_all('td')])))