0

I need to read URL content and search/regex for a pattern. e.g. in corrupt_files.jsp, I need to search for any keyword "auction_log.DATE" (where DATE is yesterday)

How can I achieve it?

Below is what I got so far:

from urllib import urlopen
import re
import time
import datetime
from datetime import date, timedelta
yesterday = date.today() - timedelta(1)

DATE= yesterday.strftime('%Y-%m-%d')

html = urlopen("http://url.com/corrupt_files.jsp").read()

for line in html.split('<tr'):
  re.search('auction_log.DATE',line)
Rio
  • 765
  • 3
  • 17
  • 37

1 Answers1

0

You can use BeautifulSoup or Scrapy to extract your content. For example with BS:

from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('corrupt_files.jsp').read()
soup = BeautifulSoup(r)
soup.body.findAll(text='auction_log.DATE') 
molivier
  • 2,146
  • 1
  • 18
  • 20