0

I want to automatically get content from a web page which is the frontend for a database.

The page contains a list of schools in a certain area. Each name is a hyperlink. I want to get all the details for each school, but those are only available by a javascript which then opens a popup window with the necessary information in a html table.

The database frontend is here: http://www.kultusportal-bw.de/,Lde/Startseite/schulebw/Schuladressen

  1. If I just click enter in the (empty) search text box, I get a result like that:
    search result

  2. If then I click on the first link "Aach, Grund- und Hauptschule", the java script opens the popup-window with the address details like that:
    details for search result 1

The hyperlink itself is only called "javascript:ShowDetails('04146900')", so it does not lead to a separate page but executes some script (which unfortunately exceeds my knowledge.) I'd like to automatically copy the name of the hyperlink together with the html content of this popup into a text or html file for all the hyperlinks. How could I do that?


I tried to see what happens with LIVE HTTP Headers in Firefox, and when I click on the link, I get the following result:

https://stewi.kultus-bw.de/didsuche/DienststellenSucheWebService.asmx/GetDienststelle

POST /didsuche/DienststellenSucheWebService.asmx/GetDienststelle HTTP/1.1
Host: stewi.kultus-bw.de
User-Agent: (...)
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/json; charset=utf-8
X-Requested-With: XMLHttpRequest
Referer: https://stewi.kultus-bw.de/didsuche/
Content-Length: 20
Cookie: ASP.NET_SessionId=3ly0zyatmod1tqoe2sbwwe0p
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
{'disch':'04146900'}
HTTP/1.1 200 OK
Cache-Control: private, max-age=0
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Sun, 05 Jan 2014 11:07:20 GMT
Content-Length: 651

I tried to "simulate" the click on the hyperlink by composing a link like **https://stewi.kultus-bw.de/POST/didsuche/DienststellenSucheWebService.asmx/GetDienststelle{'disch':'04146900'} ** but that does not work.

MostlyHarmless
  • 445
  • 2
  • 11
  • 21

2 Answers2

1

You can use a debugger Chrome to inspect the Javascript. Anyway, the quick answer to your question is:

The method ShowDetails calls the function 'LoadDetailAnsicht' (the Deunglisch is very present here)...

function LoadDetailAnsicht(disch) {
        $.ajax({
            type: "POST",
            contentType: "application/json; charset=utf-8",
            url: "DienststellenSucheWebService.asmx/GetDienststelle",
            data: "{'disch':'" + disch + "'}",
            dataType: "json",
            success: function (msg) {
                DetailAnsichtCallback_CallbackComplete(msg.d);
            }
        });
    }

It's an ajax call with json and probable a POST payload. That is, the data is posted as json. That's why you cannot get a normal URL for it.

Chris
  • 486
  • 2
  • 11
  • thanks! I don't necessarly need a normal URL, but I need the content. Sorry, I'm absolutely clueless at the moment, as I do not know AJAX or JavaScript. Is there an easy way to execute this script for each link (which has another ID like `04146900`in the example above) and write the content of the created window into a file or copy it in the clipboard? – MostlyHarmless Jan 05 '14 at 08:46
  • I also tried to see what happens with Live HTTP headers in firefox (see my edit above), but that does not work either... any hint how to make it work would be appreciated. – MostlyHarmless Jan 05 '14 at 11:12
  • 1
    It's a POST request, it send the things you try to put behind the url with the 'disch' variable stuff after the request. Here's an explanation about the difference between GET (which is what you tried to use) and POST methods: http://www.w3schools.com/tags/ref_httpmethods.asp – Chris Jan 05 '14 at 12:01
  • thanks for the link. Does "Post requests cannot be bookmarked" mean, that I can not execute this request by a hyperlink as I tried it? If yes, what else could I do to do that? Could I modify the script to display the detailed information in a separate tab? – MostlyHarmless Jan 05 '14 at 22:39
  • 1
    Yes, that's what it means. But you can try downloading a utility called curl that you can use from the command line to download things even if it's a post request. Here's some instructions that might help: http://stackoverflow.com/questions/14978411/http-post-and-get-using-curl-in-linux – Chris Jan 06 '14 at 00:58
0

As Chris mentioned cURL is the way to go here. You can copy the corresponding cURL Call from the Network Tab in Chrome Dev Tools 1.

The GetDienstelle Connection is called from LoadDetailAnsicht when you click on any of the hyperlinks.

Just replace the "DATA_ID" field in the cURL command below with any of the "disch" ids in the hrefs (e.g. javascript:ShowDetails('04146900') or extract all ids from the html table and iterate over them.

curl 'https://lobw.kultus-bw.de/didsuche/DienststellenSucheWebService.asmx/GetDienststelle' \
      -H 'Connection: keep-alive' \
      -H 'sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"' \
      -H 'Accept: application/json, text/javascript, */*; q=0.01' \
      -H 'X-Requested-With: XMLHttpRequest' \
      -H 'sec-ch-ua-mobile: ?0' \
      -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36' \
      -H 'Content-Type: application/json; charset=UTF-8' \
      -H 'Origin: https://lobw.kultus-bw.de' \
      -H 'Sec-Fetch-Site: same-origin' \
      -H 'Sec-Fetch-Mode: cors' \
      -H 'Sec-Fetch-Dest: empty' \
      -H 'Referer: https://lobw.kultus-bw.de/didsuche/' \
      -H 'Accept-Language: en-GB,en-US;q=0.9,en;q=0.8' \
      --data-raw $'{\'disch\':\'DATA_ID\'}' \
      --compressed

Worked for me.

Lennart
  • 58
  • 7