0

I want to crawl the website http://berlin.startups-list.com/startups/mobile. I need a list with the Hrefs on the website. I use Python 3.5 and Beautiful Soup.

I already scraped the website https://www.kickstarter.com with this code

Loading Libraries
import urllib
import urllib.request
from bs4 import BeautifulSoup



#define URL for scraping
theurl1 = "http://berlin.startups-list.com/startups/mobile"
thepage1 = urllib.request.urlopen(theurl1)

#Cooking the Soup
soup1 = BeautifulSoup(thepage1,"html.parser")

#-------------------------------------------------------------------------------------------------------------------
#Scraping

#Scraping "Link" (href)
href_Kunst = [i.a['href'] for i in soup1.find_all('div', attrs={'class' : 'project-thumbnail'})]
print(href_Kunst)

This code works!

But I can't access the http://berlin.startups-list.com/startups/mobile. Without the scraping part of the code.... I even can't open the website with urllib and Beautiful Soup.

The fisrt part of the code shows me the following trackback:

Traceback (most recent call last):
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1254, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1151, in _send_request
    self.endheaders(body)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 934, in _send_output
    self.send(msg)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 877, in send
    self.connect()
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 849, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 711, in create_connection
    raise err
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 702, in create_connection
    sock.connect(sa)
TimeoutError: [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\A80881\workspace\Startup List\Berlin_Mobile\__init__.py", line 16, in <module>
    thepage1 = urllib.request.urlopen(theurl1)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 466, in open
    response = self._open(req, data)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 484, in _open
    '_open', req)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
    result = func(*args)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1282, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1256, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat>

Do I load the website in a wrong way? Someone any ideas? Thx for your help

Sebastian Fischer
  • 117
  • 2
  • 2
  • 8

0 Answers0