1

I am wanting to download all .mp4 files from a single URL. I have seen examples of how to download a file with urllib, but the examples look something like:

urllib.request.urlretrieve('http://example.com/big.zip', 'file/on/disk.zip')

In these examples they specify the exact file to download, big.zip, but I don't know the name of every file in the directory on the site, I only know the file extensions.

I would like to be able to put in something like this for the website:

urllib.request.urlretrieve('http://example.com/videos/', 'file/on/disk')

And then download all of the .mp4 files. I believe I can use .endswith to sort the specific file extensions.

I am still new to using urllib and I have never used BeautifulSoup, but I've seen it used in several examples, so I don't even know if this can be done.

Downloading files from multiple websites.

urllib.request for python 3.3 not working to download file

How do I download a file over HTTP using Python?

Community
  • 1
  • 1
Andrew
  • 13
  • 1
  • 4
  • There is no standard way to check all the files a site will serve (intentionally, this kind of thing is generally really terrible for site owners, as you are sucking tons of bandwidth). See if your site has an API or archiving system for this kind of thing - if not, you probably want to contact the site owner to check this is a legitimate use of their content. Some sites may list all files, but this is uncommon for security reasons, and because a lot of sites are not simply backed by file systems. – Gareth Latty Oct 16 '14 at 18:06
  • you could find all the links with beautifulsoup – Padraic Cunningham Oct 16 '14 at 18:23
  • @PadraicCunningham I'll have to look into BeautifulSoup. I was hoping to use the modules included in Python, but it looks easier with BeautifulSoup. – Andrew Oct 16 '14 at 18:24
  • it is very easy find links with beautifulsoup, using requests and bsoup, four or five lines of code would probably get all you need – Padraic Cunningham Oct 16 '14 at 18:26

1 Answers1

0

If you are able to list directory content (listed by Apache) you should parse this output, build list of files, and call single file download routine in loop.

If you are not able, you cannot do it, there is a reason why apps usually hide files structure from users.

Łukasz Rogalski
  • 22,092
  • 8
  • 59
  • 93
  • I thought I might have to parse the HTML output and pull the matching file names out of the code and loop over them. I just didn't want to do all of that work if there was an easier way. – Andrew Oct 16 '14 at 18:10