0

I need to download n # of files from an http location using urllib2 which start with a given pattern "my_file_$TIMESTAMP.xml". I'd like to get all files that start with "my_file" excluding the timestamp component.

I've seen how this is possible using glob for local files here: Find all files in a directory with extension .txt in Python but i'm not sure if this is possible over an http connection with basic authentication?

Community
  • 1
  • 1
Brad
  • 6,106
  • 4
  • 31
  • 43
  • Is this HTTP or FTP? They're different protocols. – user2357112 Aug 16 '13 at 16:59
  • 1
    Without some kind of sitemap or directory index, this is pretty much impossible. (Short of brute force requesting the huge number of possibilities.) – Wooble Aug 16 '13 at 17:05
  • @Wooble that is *not* what i want to hear :) – Brad Aug 16 '13 at 17:06
  • If you don't have a directory listing but you know the files are there, maybe you could follow the same procedure you use to determine which files are there, with a scraping library? – Joseph Dunn Aug 16 '13 at 17:16
  • if you know what the min/max timestamps are ... just set up a cloud to step through all timestamps between (Im assuming numeric timestamps) It shouldnt take it more than a few hours probably – Joran Beasley Aug 16 '13 at 17:17
  • @JoranBeasley that won't work for me because these operations will need to be performed and repeated in the order of seconds. – Brad Aug 16 '13 at 17:18
  • do you have any control over the server? – Joran Beasley Aug 16 '13 at 17:19
  • @JoranBeasley unfortunately no. its a 3rd party's – Brad Aug 16 '13 at 17:20
  • if you just goto http://url.com/folder_with_xmls/ do you see a listing of them? if so you can just scrape that with requests(or urllib). where are you trying to scrape these from? – Joran Beasley Aug 16 '13 at 17:21

0 Answers0