1

I can scrape the target values for a date-specific URL... How should I setup datetime and scraping to skip URLs that do not have the target table?? this is the code I have so far --

date = datetime.datetime.today()
url = "http://www.wsj.com/mdc/public/page/2_3022-mfsctrscan-moneyflow- 20161205.html?mod=mdc_pastcalendar"

I know I sub in {date} to the URL to get the date dynamic to work - supplied a static url in case URL is blank.

date_time = urlopen(url.format(date=date.strftime('%Y%m%d')))
address = url
print 'Retrieving information from: ' + address    
print '\n'
soup = BeautifulSoup (requests.get(address).content, "lxml")

scraping proceeds as:

rows = soup.select('div#column0 table tr')[2:]

headers = ['name', 'last', 'chg', 'pct_chg',
       'total_money_flow', 'total_tick_up', 'total_tick_down', 'total_up_down_ratio',
       'block_money_flow', 'block_tick_up', 'block_tick_down', 'block_up_down_ratio']
for row in rows:
# skip non-data rows
    if row.find("td", class_="b14") is True:
    continue

print(dict(zip(headers, [cell.get_text(strip=True) for cell in row.find_all('td')])))
Derek_P
  • 658
  • 8
  • 29
  • Can't you use a `try` to skip the dates that don't have the target table? – Rafael Dec 08 '16 at 23:52
  • I think so, but how would the values look to get a script to fetch retroactive dates - I know if it were numerical like URL.x. I could use n=1; for i in range(1, n+1) but not sure how to manipulate it datetime-wise – Derek_P Dec 09 '16 at 00:03
  • 1
    Off the top of my head, you could make a separate file with the dates and read them in as a list, then iterate over that list, or use `timedelta` http://stackoverflow.com/questions/1712116/formatting-yesterdays-date-in-python – Rafael Dec 09 '16 at 00:08

0 Answers0