I want to crawl the website http://berlin.startups-list.com/startups/mobile. I need a list with the Hrefs on the website. I use Python 3.5 and Beautiful Soup.
I already scraped the website https://www.kickstarter.com with this code
Loading Libraries
import urllib
import urllib.request
from bs4 import BeautifulSoup
#define URL for scraping
theurl1 = "http://berlin.startups-list.com/startups/mobile"
thepage1 = urllib.request.urlopen(theurl1)
#Cooking the Soup
soup1 = BeautifulSoup(thepage1,"html.parser")
#-------------------------------------------------------------------------------------------------------------------
#Scraping
#Scraping "Link" (href)
href_Kunst = [i.a['href'] for i in soup1.find_all('div', attrs={'class' : 'project-thumbnail'})]
print(href_Kunst)
This code works!
But I can't access the http://berlin.startups-list.com/startups/mobile. Without the scraping part of the code.... I even can't open the website with urllib and Beautiful Soup.
The fisrt part of the code shows me the following trackback:
Traceback (most recent call last):
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1254, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1151, in _send_request
self.endheaders(body)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1102, in endheaders
self._send_output(message_body)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 934, in _send_output
self.send(msg)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 877, in send
self.connect()
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 849, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 711, in create_connection
raise err
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 702, in create_connection
sock.connect(sa)
TimeoutError: [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\A80881\workspace\Startup List\Berlin_Mobile\__init__.py", line 16, in <module>
thepage1 = urllib.request.urlopen(theurl1)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 466, in open
response = self._open(req, data)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 484, in _open
'_open', req)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1282, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1256, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat>
Do I load the website in a wrong way? Someone any ideas? Thx for your help