0

Solved using the answer from QHarr!

Trying to extract some information (starting with the title) from a website. The code below works fine with http://google.com, but not with the link i need (url).

Error code: "HTTP Error 500: Internal Server Error"

Am I doing something wrong? Is it possible to do this another way?

from urllib.request import urlopen
import urllib.error
import bs4
import time

url = "http://st.atb.no/New/minskjerm/FST.aspx?visMode=1&cTit=&c1=1&s1=16011301&sv1=&cn1=&template=2&cmhb=FF6600&cmhc=00FF00&cshb=3366FF&cshc=FFFFFF&arb=000000&rows=1&period=&" 


for i in range(5): #Try 5 times to reach page
    try: 
     html = urlopen(url)
    except urllib.error.HTTPError as exc:
        print('Error code: ', exc)
        time.sleep(1) # wait 10 seconds and then make http request again
        continue
    else:
        print('Success')
        break


soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.find('title')
print(title.getText()) 


jacobara
  • 3
  • 1
  • 3
  • A http status code 5xx states that something went wrong of the side of the server. Without looking in the server logs it will be difficult to help you to find the error. – Endzeit Nov 18 '19 at 19:50
  • some websites will deny traffic if it doesnt have headers making it look like a broswer. https://stackoverflow.com/questions/802134/changing-user-agent-on-urllib2-urlopen – hurnhu Nov 18 '19 at 20:46
  • @Endzeit : Where can I find the server logs? The website works fine in the browser. – jacobara Nov 18 '19 at 21:25

2 Answers2

0

Hey jacobara i think its something wrong with the site.U can still read the response with this

for i in range(5): #Try 5 times to reach page
     try: 
     html = urlopen(url)
     except urllib.error.HTTPError as exc:
        print('Error code: ', exc)
        content = exc.read()
        print(content)
        time.sleep(1) # wait 10 seconds and then make http request again
        continue
    else:
        print('Success')
        break
Origin
  • 1,182
  • 1
  • 10
  • 25
0

The page makes a POST request you can mimic direct

import requests
from bs4 import BeautifulSoup as bs

body = {"terminal": "1,16011301,," , "rows": 1,"visMode": 1}
r = requests.post('http://st.atb.no/New/minskjerm/DataHandler.ashx?type=departureTimes&lang=no', data = body)
soup = bs(r.content, 'lxml')
QHarr
  • 83,427
  • 12
  • 54
  • 101