POST proxy.gba search with Python

Question

I'm trying to scrape all news items from this website. They are not showing in the source code: http://finansdanmark.dk/nyheder/

I've tried using Firefox' LIVE Http Headers and Chrome's developer tool but still can't figure out what goes on behind the scenes.

This is my code so far:

r = requests.post("http://finansdanmark.dk/nyheder/proxy.gba")  
text = r.text  
print (text)

Can anyone help?

The POST request requires a body, check it out on Chromes' developer tools, network tab. — Curro, May 09 '17 at 16:19
Thanks Curro! I've taken another look at Chrome's network tab and tried this (among other things): url = 'http://finansdanmark.dk/nyheder/' headers = { 'Content-type': 'application/json', 'Accept': 'text/javascript', 'Connection': 'keep-alive', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36', } r = requests.get(url, headers=headers) text = r.text print (text) It still doesn't work. When you say 'body' - what does that mean? — bib, May 09 '17 at 17:24
Sorry, I don't know how to write the code so that it is readable on this site. — bib, May 09 '17 at 17:31
Problem solved :-) It turned out I didn't pass 'payload' in my POST request. 'Payload' is shown at the bottom of Chrome's 'Network' - > 'Headers'. I also had to add json.dumps to my request `r = requests.post(url, data=json.dumps(payload))`. Inspiration: http://stackoverflow.com/questions/15694120/why-does-http-post-request-body-need-to-be-json-enconded-in-python — bib, May 11 '17 at 08:31

0 Answers0