-2

I'm trying to extract data from a site on routes between different airports. The user is meant to pick two airports, then the program will show them all the different routes on a given day. Only, after searching for a route on the site, the URL changes to the same .asp domain name, no matter what route you are looking at. Is there a way to web scrape data from a specific route without knowing the URL, or is there possibly a way to obtain the true URL?

Bob Smith
  • 220
  • 4
  • 21

2 Answers2

10

I would recommend using JSoup for this. To do so add below to pom.xml

<dependency>
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.11.2</version>
</dependency>

Then you fire a first request to just get cookied

    Connection.Response initialPage = Jsoup.connect("https://www.flightview.com/flighttracker/")
            .headers(headers)
            .method(Connection.Method.GET)
            .userAgent(userAgent)
            .execute();
    Map<String, String> initialCookies = initialPage.cookies();

Then you fire the next request with these cookies

    Connection.Response flights = Jsoup.connect("https://www.flightview.com/TravelTools/FlightTrackerQueryResults.asp")
            .userAgent(userAgent)
            .headers(headers)
            .data(postData)
            .cookies(initialCookies)
            .method(Connection.Method.POST)
            .execute();

The postData and headers in this case is

    HashMap<String, String> postData = new HashMap<String, String>();
    HashMap<String, String> headers = new HashMap<String, String>();

    headers.put("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8");
    headers.put("Accept-Encoding", "gzip, deflate, br");
    headers.put("Accept-Language", "en-US,en;q=0.9");
    headers.put("Cache-Control", "no-cache");
    headers.put("DNT", "1");
    headers.put("Pragma", "no-cache");
    headers.put("Upgrade-Insecure-Requests", "1");

    postData.put("qtype", "cpi");
    postData.put("sfw", "/FV/FlightTracker/Main");
    postData.put("namdep", "DFW Dallas, TX (Dallas/Ft Worth) - Dallas Fort Worth International");
    postData.put("depap", "DFW");
    postData.put("namarr", "JFK New York, NY (Kennedy) - John F Kennedy International");
    postData.put("arrap", "JFK");
    postData.put("namal2", "Enter name or code");
    postData.put("al", "");
    postData.put("whenArrDep", "dep");
    postData.put("whenHour", "all");
    postData.put("whenDate", "20180321");
    postData.put("input", "Track Flight");

Now when you have got the data, you can parse and print stuff out of it

    String page = flights.body();
    System.out.println(page);
    Document doc = Jsoup.parse(page);
    Elements elems = doc.select("tr.FlightTrackerListRowOdd, tr.FlightTrackerListRowEven");

    for(Element element : elems) {
        Elements childElems = element.select("td");
        String text1 =  childElems.get(0).text();
        String text2 =  childElems.get(1).text();
        System.out.println(text1 + " " + text2);
    }

The output of the same is

Aeroflot Airlines 3453
Aeroflot Airlines 3455
AeroMexico 4966
AeroMexico 4935
Air France 2535
Alitalia 3403
American Airlines 1294
British Airways 1880
China Eastern Airlines 8804
Delta Air Lines 3869
Delta Air Lines 3789
Etihad Airways 3040
Finnair 5726
Gulf Air 4139
Iberia Airlines 4043
Jet Airways 7692
KLM Royal Dutch Airlines 6597
KLM Royal Dutch Airlines 8117
Korean Air 7326
Malaysia Airlines 9442
Qatar Airways 5107
TAM Brazilian Airlines 8379
Virgin Atlantic 4620
Virgin Atlantic 3471

Rest you can start altering the same based on your needs. This shows you can example of how to do it

Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265
6

Open developer tools in your browser, and submit info in search box for arrival and destination and submit.

Then if you check requests sent by the browser to server you will notice that a post request with form data ,which you have just submitted, is sent to https://www.flightview.com/TravelTools/FlightTrackerQueryResults.asp

If you want to scrape this data then you can send a post request to this url using python requests module.

NOTE: since you are using java a simple post request can still be sent. You can check on how to send a post request here

Gaur93
  • 685
  • 7
  • 19
  • Thank you for the response, but I should have clarified: this needs to be done in a Java program. – Bob Smith Feb 03 '18 at 00:52
  • 1
    @BobSmith I have modified the answer for java request – Gaur93 Feb 03 '18 at 06:23
  • I'm not sure I understand how to send a java post request, even after examining the answers in the link you gave me. The top answer uses Apache, which I'd rather not use for this, and all of the following answers seem to input a username and password into the site in order to send a post request. All I ultimately want to be able to do is read the information on the .asp page so that the user can see all flights between those destinations on a given day. I appreciate your answer, I just fear that I don't know how to go about attempting your solution. – Bob Smith Feb 03 '18 at 23:44
  • If it's not too much trouble would you be able to go about explaining how I can actually send a java post request in order to read data from this .asp protected domain. I'm afraid I just don't understand the solutions given by the link you provided. – Bob Smith Feb 21 '18 at 18:08