I'll tell you my problem.(Sorry for my english)
I have to connect to a server every day to retrieve content.
The page on which I am connecting is in this form:
<tr><td><a href='https://www.test.com/thing1.xlsx' target='_blank'>thing1.xlsx</a><td>01 September 2019 10:02:03 /td><td>1 KB</td></tr>
<tr><td><a href='https://www.test.com/thing2.pdf' target='_blank'>thing2.pdf</a><td>02 September 2019 10:02:03 /td><td>1 KB</td></tr>
<tr><td><a href='https://www.test.com/thing test 3.pdf' target='_blank'>thing test 3.pdf</a><td>04 September 2019 10:02:03 /td><td>1 KB</td></tr>
<tr><td><a href='https://www.test.com/thing test 4.pdf' target='_blank'>thing test 4.pdf</a><td>04 September 2019 10:02:04 /td><td>1 KB</td></tr>
<tr><td><a href='https://www.test.com/thing test 5.pdf' target='_blank'>thing test 5.pdf</a><td>04 September 2019 10:02:05 /td><td>1 KB</td></tr>
From this page (content will be added continuously) I must retrieve the urls (under href) of the files on the current date. For example, if today is September 04, I have to get my 3 files: "thing test 3.pdf", "thing test 4.pdf" and "thing test 5.pdf" (we notice here that some URLs have spaces).
I started writing a script in python (with lxml), but I'm beginner I could use some help.
# coding: utf-8
from lxml import etree, html
parser = etree.HTMLParser()
tree = etree.parse("test.html", parser)
URL = tree.xpath('//a/@href')
NAMEFILE = tree.xpath('//a/text()')
print URL
I am able to get my urls but not by today's date. Any ideas?