0

I was trying to scrape some data off from a site using beautifulsoup on python 3.5(i'm working on eclipse) and requests from the site 'http://www.transfermarkt.com/arsenal-fc/startseite/verein/11/saison_id/2015' which has some stats for footballers.

my code:

from bs4 import BeautifulSoup
import requests
r=requests.get('http://www.transfermarkt.com/arsenalfc/startseite/verein/11/saison_id/2015')
soup = BeautifulSoup(r.content, 'html.parser')
print (soup.prettify())

I expect a neat and pretty looking html code but all i get as output is this:

<html>
 <head>
  <title>
   404 Not Found
  </title>
 </head>
 <body bgcolor="white">
  <center>
   <h1>
    404 Not Found
   </h1>
  </center>
  <hr>
   <center>
    nginx
   </center>
  </hr>
 </body>
</html>

For a different url it works. I have tried a couple of other url's and it worked. But not for this one. Am I doing something wrong. Any suggestion is appreciated. Thanks

Monkey D Luffy
  • 94
  • 3
  • 11

1 Answers1

4

You should use a user-agent to make the website think the request comes from a browser. This worked for me:

from bs4 import BeautifulSoup
import requests

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
r=requests.get('http://www.transfermarkt.com/arsenalfc/startseite/verein/11/saison_id/2015', headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())
Walid Saad
  • 951
  • 1
  • 8
  • 14
  • Walid, I tried the same exact code and it did not work for me. It says "return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2261' in position 15463: character maps to " – Monkey D Luffy Feb 18 '16 at 01:40
  • check this answer http://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters?rq=1 – Walid Saad Feb 18 '16 at 02:15