0

I'm new to python, and I am having problems using Beautifulsoup to scrape multiple url's from either a text list, or even coded into the program. Here is an example of my code.

import requests 
from bs4 import BeautifulSoup
import re
  
url = 'https://0.0.0.0/directory/'
r = requests.get(url) 
soup = BeautifulSoup(r.content, 'html5lib') 

with open("1.txt", "w") as f:

    for name, date in zip(
        soup.find_all("a", {"class": "name"}), soup.find_all("span", {"class": "date"})
    ):
        f.write(name.text.strip() + " ")
        f.write(date.text.strip() + "\n")

This works great for one url, but when I add two it fails. It also fails when trying to load a list from a text file. I have about 25 urls in a file that I would like the program to run through and collect daily.

Failed multiple url code.

url = ['https://0.0.0.0/directory/', 'https://0.0.0.0/directory/']

Error message:

┌──(c4㉿ib)-[~/Desktop/dev]
└─$ python3 test.py
Traceback (most recent call last):
  File "crime.py", line 9, in <module>
    r = requests.get(url) 
  File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 637, in send
    adapter = self.get_adapter(url=request.url)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 728, in get_adapter
    raise InvalidSchema("No connection adapters were found for {!r}".format(url))
requests.exceptions.InvalidSchema: No connection adapters were found for "['https://0.0.0.0/directory/', 'https://0.0.0.0/directory/']"
                                                                                                    
┌──(c4㉿ib)-[~/Desktop/dev]
└─$    

Clearly I am not scraping 0.0.0.0 I renamed the domain for the question. Any advice what I am doing wrong would be helpful. I would rather grab from a list so my code doesn't have 25 urls stuffed into it. Thank you.

slurm
  • 35
  • 4
  • Does this answer your question? [Python Requests - No connection adapters](https://stackoverflow.com/questions/15115328/python-requests-no-connection-adapters) – MendelG Aug 26 '20 at 01:58
  • 1
    Doing this: url = ''' https://www.0.0.0.0 ''' doesn't throw the error, but it also doesn't save anymore to the file like it was before. – slurm Aug 26 '20 at 02:09
  • have you tried looping through the URLs? – MendelG Aug 26 '20 at 02:11

1 Answers1

0

Try looping through the URL's and request each one separately:

import requests
from bs4 import BeautifulSoup

urls = ['https://0.0.0.0/directory/', 'https://0.0.0.0/directory/']


with open("output.txt", "w") as f:

    for url in urls:
        print(url)
        resp = requests.get(url).content
        soup = BeautifulSoup(resp, "html.parser")

        for name, date in zip(
            soup.find_all("a", {"class": "name"}), soup.find_all("span", {"class": "date"})
        ):
            f.write(name.text.strip() + " ")
            f.write(date.text.strip() + "\n")
MendelG
  • 14,885
  • 4
  • 25
  • 52