1

I have the following script and I would like to retrieve the URL's from a text file rather than an array. I'm new to Python and keep getting stuck!

from bs4 import BeautifulSoup
import requests
urls = ['URL1',
        'URL2',
        'URL3']
for u in urls:
   response = requests.get(u)
   data = response.text
   soup = BeautifulSoup(data,'lxml')
Teege
  • 101
  • 6

1 Answers1

1

Could you please be a little more clear about what you want?

Here is a possible answer which might or might not be what you want:

from bs4 import BeautifulSoup
import requests
with open('yourfilename.txt', 'r') as url_file:
   for line in url_file:
      u = line.strip()
      response = requests.get(u)
      data = response.text
      soup = BeautifulSoup(data,'lxml')

The file was opened with the open() function; the second argument is 'r' to specify we're opening it in read-only mode. The call to open() is encapsulated in a with block so the file is automatically closed as soon as you no longer need it open. The strip() function removes trailing whitespace (spaces, tabs, newlines) at the beginning and end of every line, for instant ' https://stackoverflow.com '.strip() becomes 'https://stackoverflow.com'.

Stef
  • 13,242
  • 2
  • 17
  • 28