-1

I've tried using wget in Python to download links from a txt file. What should I use to help me do this?

I've using the wget Python module.

r = requests.get(url)
html = r.text

soup = BeautifulSoup(html, 'html.parser')
body = soup.body
s = "https://google.com/"

for url in soup.find_all('a'):
  f = open("output.txt", "a")
  print(str(s), file=f, end = '')
  print(url.get('href'), file=f)
  f.close()

So far I've only been able to create the text file then use wget.exe in the command prompt. I'd like to be able to do all this in 1 step.

1 Answers1

0

Since you're already using the third party requests library, just use that:

from os.path import basename

with open('output.txt') as urls:
    for url in urls:
        response = requests.get(url)
        filename = basename(url)
        with open(filename, 'wb') as output:
            output.write(repsonse.content)

This code makes many assumptions:

  • The end of the url must be a unique name as we use basename to create the name of the downloaded file. e.g. basename('https://i.imgur.com/7ljexwX.gifv') gives '7ljexwX.gifv'
  • The content is assumed to be binary not text and we open the output file as 'wb' meaning 'write binary'.
  • The response isn't checked to make sure there were no errors
  • If the content is large this will be loaded into memory and then written to the output file. This may not be very efficient. There are other questions on this site which address that.
  • I also haven't actually tried running this code.
Peter Wood
  • 23,859
  • 5
  • 60
  • 99