Extracting html from generated webside using YQL and htmlstring

Question

I try to extract and display train connections from the DeutscheBahn webside www.reiseauskunft.de to show them on a Infodisplay (that just shows a simple html page with some javascript.

So i want to put these info (next available connections) in my html page.
DeutscheBahn provides a "kind" of API ??? or at least that looks like an API:

www.reiseauskunft.bahn.de/bin/query.exe/dn?S=MainzHbf&Z=Frankfurt(Main)Hbf&timeSel=depart&start=1

This link works and delivers a full webpage with the next three conections from (S)tart station to (Z) target station and gets the acctual time as the depart time (the start=1 parameter just executes the request).

You can find more infos about the parameters here (only german) www.geiervally.lechtal.at/sixcms/media.php/1405/Parametrisierte%20%DCbergabe%20Bahnauskunft(V%205.12-R4.30c,%20f%FCr.pdf

Because html table seems no longer supported i found the info to use htmlstring (YQL: html table is no longer supported)

I changed the example to my needs:

var site = "http://www.reiseauskunft.bahn.de/bin/query.exe/dn?S=MainzHbf&Z=Frankfurt(Main)Hbf&timeSel=depart&start=1";
var yql = "select * from htmlstring where url='" + site;
var resturl = "http://query.yahooapis.com/v1/public/yql?q=" + encodeURIComponent(yql) + "&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys";

but got the description in browser "Query syntax error(s) [line 1:140 mismatched character ' ' expecting ''']" (??? at this position i cant find a " "and would not put a "'"???)

in yql console i put the following

select * from htmlstring where url='http://www.reiseauskunft.bahn.de/bin/query.exe/dn?S=MainzHbf&Z=Frankfurt(Main)Hbf&timeSel=depart&start=1'

and there i got the Exception: Redirected to a robots.txt restricted URL.

Do both messages correspond the same??? or can i bypass the robots.txt message (does the yql function react like a robot for the page reiseauskunft.de?) Is there a chance to retrieve the train connections with yql ?

Thanks in advance

Edit: it seems my approuch with yql will not work so i will try another approuch - question closed?!

Can you elaborate more about which info do you want to extract? also, it is possible that redirect means, you don't have permissions for read/scrape the content. — Mauricio Arias Olave, Jun 29 '17 at 21:51
The response delivers a webpage with the next three connections from the current time on. I want to extract the start time — Harald.m, Jul 01 '17 at 18:49
The response delivers a webpage with the next three connections from the current time on. I want to extract the start time, the delay time if exists and the train number/kind of train. The information is stored in a with id=resultsOverview. and ongoing there are three elements with id="boxShadow scheduledCon". There i can extract the data i want. I don't know if redirection is allowed or how i can see if it is possible? Due to the fact that i can call the webpage with the mentioned link in my prevoius psoting i think read/scrape should be possible ?! — Harald.m, Jul 01 '17 at 18:55
Harald,m please note3 what I said in my previous comment: ` it is possible that redirect means, you don't have permissions for read/scrape the content.` if you try with news.google.com, you'll have the same results. My suggestion is, try another approach. — Mauricio Arias Olave, Jul 04 '17 at 13:33
Mauricio Arias Olave - I tried to retrieve the URL htmlsource with a python script (like curl) and compared that with the sourcecode in the browser - and it is not the same. For my understanding it seems that some javascript is processed on the side so that i can not extract the infos from raw htmlsource - so yql is not working too because it can not process the javascript - am i right? whatever - i will try another approuch like you suggested - thanks anyway — Harald.m, Jul 04 '17 at 18:46

Extracting html from generated webside using YQL and htmlstring

0 Answers0