0

I am experimenting with JSoup, and I cannot get my 2nd go-around with my Scanner to work. It skips directly to my catch statement.

Here is a description of the program:

I take a google search term as user input (String). Next, I ask for the number of query items that the user wishes to see, and enter an integer.

I loop through each element that is returned and add it to an ArrayList. The String displayed on the console consists of an index, Link Text, and a hyperlink.

I then want to ask the user which index they would like to enter to open a browser window leading to that link. This is done by cocantenating the hRef string with the Linux terminal command "xdg-open " using the Runtime class.

It works great up until it's time to ask which index will be chosen.

Here is my code:

/**
 * Created by christopher on 4/26/16.
 */

import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class GoogleSearchJava {

    static int index;
    static String linkHref;
    static Scanner input;

    public static final String GOOGLE_SEARCH_URL = "https://www.google.com/search";

    public static void main(String[] args) throws IOException {

        //GET INPUT FOR SEARCH TERM

        input = new Scanner(System.in);
        System.out.print("Search: ");
        String searchTerm = input.nextLine();
        System.out.print("Enter number of query results: ");
        int num = input.nextInt();

        String searchURL = GOOGLE_SEARCH_URL + "?q=" + searchTerm + "&num=" + num;

        //NEED TO DEFINE USER AGENT TO PREVENT 403 ERROR.
        Document document = Jsoup.connect(searchURL).userAgent("Mozilla/5.0").get();

        //OPTION TO DISPLAY HTML FILE IN BROWSWER. DON'T KNOW YET.
        //System.out.println(doc.html());

        //If google search results HTML change the <h3 class="r" to <h3 class ="r1"
        //need to change below stuff accordingly
        Elements results = document.select("h3.r > a");

        index = 0;
        String news = "News";
        ArrayList<String> displayResults = new ArrayList<>();
        for (Element result : results) {
            index++;
            linkHref = result.attr("href");
            String linkText = result.text();
            String pingResult = index + ": " + linkText + ", URL:: " + linkHref.substring(6, linkHref.indexOf("&")) + "\n";

            if (pingResult.contains(news)) {
                System.out.println("FOUND " + "\"" + linkText + "\"" + "NO HYPERTEXT FOR NEWS QUERY RESULTS AT THIS TIME. SKIPPED INDEX.");
                System.out.println();
            } else {
                displayResults.add(pingResult);
            }
        }
        for(String urlString : displayResults) {
            System.out.println(urlString);
        }
        System.out.println();

        goToURL(input, displayResults);
    }
    public static int goToURL(Scanner input, ArrayList<String> resultList) {

        int newIndex = 0;

        try {

            System.out.print("Enter Index (i.e. 1, 2, etc) you wish to visit, 0 to exit: ");

            newIndex = input.nextInt();
            input.nextLine();

            for (String string : resultList) {

                if(string.startsWith(String.valueOf(newIndex))) {

                    Process process = Runtime.getRuntime().exec("xdg-open " + string.substring(6, string.indexOf("&")));
                    process.waitFor();
                }
            }
        } catch (Exception e) {
            System.out.println("ERROR while parsing URL");
        }
        return newIndex;
    }
}

HERE IS THE OUTPUT Notice how it stops after I enter "1" No, I haven't taken care of pressing "0" yet:

Search: Oracle
Enter number of query results: 3
1: Oracle | Integrated Cloud Applications and Platform Services, URL:: =http://www.oracle.com/

2: Oracle Corporation - Wikipedia, the free encyclopedia, URL:: =https://en.wikipedia.org/wiki/Oracle_Corporation

3: Oracle on the Forbes America's Best Employers List, URL:: =http://www.forbes.com/companies/oracle/


Enter Index (i.e. 1, 2, etc) you wish to visit, 0 to exit: 1
ERROR while parsing URL

Process finished with exit code 0
DevOpsSauce
  • 1,319
  • 1
  • 20
  • 52
  • See also [When Runtime.exec() won't](http://www.javaworld.com/article/2071275/core-java/when-runtime-exec---won-t.html) for many good tips on creating and handling a process correctly. Then ignore it refers to `exec` and use a `ProcessBuilder` to create the process. – Andrew Thompson Apr 27 '16 at 02:42

1 Answers1

1

ERROR while parsing URL suggests that error comes from

try {

    System.out.print("Enter Index (i.e. 1, 2, etc) you wish to visit, 0 to exit: ");

    newIndex = input.nextInt();
    input.nextLine();

    for (String string : resultList) {

        if(string.startsWith(String.valueOf(newIndex))) {

            Process process = Runtime.getRuntime().exec("xdg-open " + string.substring(6, string.indexOf("&")));
            process.waitFor();
        }
    }
} catch (Exception e) {
    System.out.println("ERROR while parsing URL");
}

I am not working on Linux so I can't test it but I suspect that your url shoulnd't start with = (you will notice that your console contains URL:: =... where your printing statement doesn't have this = so it is part of address you are trying to visit).

So change in .substring(6, hRef.indexOf("&")) 6 to 7.


Other problem is that hRef is set to be linkHref which will be last result from google you picked. You should probably create your own class which will store proper href and its description, or pass list of Element representing <a ...>..</a> elements which you picked (also you don't need to check elements in list based on their 1: ... format, simply use list.get(index - 1) if you want to map 1 to index 0, 2 to index 1 and so on).


Last advice for now is that you may change your code to be more OS independent with solution described here How to open the default webbrowser using java rather than trying to execute xdg-open

Community
  • 1
  • 1
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • I got the Desktop class code working from the above link, but I'm not sure how to implement your suggestion for the linkHRef variable. – DevOpsSauce Apr 26 '16 at 22:37
  • 1
    @IRGeekSauce I tried to simply your code a little. You can find it here http://pastebin.com/VTNey8H8 (I don't want to add full solution here as it would make this answer too broad). – Pshemo Apr 26 '16 at 23:03
  • Go ahead and post your code here. I tried it out and it works PERFECTLY. I will gladly accept your answer. :) – DevOpsSauce Apr 26 '16 at 23:20
  • One problem, though. If I change the number to 5, it gives me 8 query results. ??????? But your answer still answered my initial question. – DevOpsSauce Apr 26 '16 at 23:30
  • @IRGeekSauce I am not sure what you mean. If I set `int num = 5;` then your query will add `&num=5` parameter which should set max results to 5. – Pshemo Apr 27 '16 at 00:21
  • I replaced '3' with '5', and this was my output: 1 Oracle | Integrated Cloud Applications and Platform Services 2 Oracle Careers 3 Oracle Database 4 About Oracle 5 Support 6 Downloads 7 Java 8 Oracle Corporation - Wikipedia, the free encyclopedia – DevOpsSauce Apr 27 '16 at 01:00
  • I am afraid I can't help you here. Maybe it is google which handles differently parameters provided in URL. Try printing your `searchURL` and visit it using your browser. See if it will really contain 8 results. – Pshemo Apr 27 '16 at 01:07
  • I fixed it. At the line "for(Element a : filteredResults)" I added a while loop with the condition while(index <= num), and it achieved the results. – DevOpsSauce Apr 27 '16 at 01:07
  • It is way around, but it is strange tat you are getting this results in the first place. – Pshemo Apr 27 '16 at 01:08