3

So I'm trying to download files(both images and documents) from a website I have scraped. I have to download these to a specific folder. So far I have:

images = re.findall("/([^/]+\.(?:jpg|gif|png))", html)
output = open("output.txt","a+")
output.write("\n" + f"[+] {len(images)} Images Found:" + "\n")
for images in images:
    output.write(images + "\n")
    output.write("Beginning file download with urllib2..." + "\n")
    imageurl = "images"
    urllib.request.urlretrieve(url, "/downloads")

How would I keep the file names the same as it is on the website with the specific file type ect?

This is just a snippit of the code to handle the images only.

Skye Smith
  • 31
  • 2
  • Possible duplicate of [python httplib/urllib get filename](https://stackoverflow.com/questions/11783269/python-httplib-urllib-get-filename) – Freshollie Nov 30 '17 at 17:13

1 Answers1

1

You can put the output filename into the urllib.request.urlretrieve.

images = re.findall("/([^/]+\.(?:jpg|gif|png))", html)
output = open("output.txt","a+")
output.write("\n" + f"[+] {len(images)} Images Found:" + "\n")
for images in images:
    output.write(images + "\n")
    output.write("Beginning file download with urllib2..." + "\n")
    imageurl = "images"
    urllib.request.urlretrieve(url, "/downloads" + imagename)

[You only have to set the variable to the name of the image. For example image.png]

I hope i could help you.

domx4q
  • 62
  • 6