http download all files that start with a string

Asked Aug 16 '13 at 16:58

Active Aug 16 '13 at 16:58

Viewed 103 times

I need to download n # of files from an http location using urllib2 which start with a given pattern "my_file_$TIMESTAMP.xml". I'd like to get all files that start with "my_file" excluding the timestamp component.

I've seen how this is possible using glob for local files here: Find all files in a directory with extension .txt in Python but i'm not sure if this is possible over an http connection with basic authentication?

edited May 23 '17 at 12:20

Community

asked Aug 16 '13 at 16:58

Brad

6,106
4
31
43

Is this HTTP or FTP? They're different protocols. – user2357112 Aug 16 '13 at 16:59
1

Without some kind of sitemap or directory index, this is pretty much impossible. (Short of brute force requesting the huge number of possibilities.) – Wooble Aug 16 '13 at 17:05
@Wooble that is *not* what i want to hear :) – Brad Aug 16 '13 at 17:06
If you don't have a directory listing but you know the files are there, maybe you could follow the same procedure you use to determine which files are there, with a scraping library? – Joseph Dunn Aug 16 '13 at 17:16
if you know what the min/max timestamps are ... just set up a cloud to step through all timestamps between (Im assuming numeric timestamps) It shouldnt take it more than a few hours probably – Joran Beasley Aug 16 '13 at 17:17
@JoranBeasley that won't work for me because these operations will need to be performed and repeated in the order of seconds. – Brad Aug 16 '13 at 17:18
do you have any control over the server? – Joran Beasley Aug 16 '13 at 17:19
@JoranBeasley unfortunately no. its a 3rd party's – Brad Aug 16 '13 at 17:20
if you just goto http://url.com/folder_with_xmls/ do you see a listing of them? if so you can just scrape that with requests(or urllib). where are you trying to scrape these from? – Joran Beasley Aug 16 '13 at 17:21

http download all files that start with a string

0 Answers0