1

So I'm coding in JSP language but I need to do this in a java servlet. I have a way inside the java program (servlet) to get a URL entered by a user. I stored the URL inside a string. Now all that is left to do is to get the page title information from the URL or the website. Essentially I want to get the title tag inside the html code from the URL. I have never done this before so I was wondering if anyone could give me any pointers on how to do this.

For example lets say i want to get the page title from http://www.computerhope.com/issues/ch000746.htm

When I look at the html code it shows it's "How to view the HTML source code of a web page" as shown in the html code.

<title>How to view the HTML source code of a web page </title>

So how would I be able to access that inside a java program?

Sarah
  • 25
  • 2
  • 5

3 Answers3

6

Try this one.

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.Scanner;

public class URLTest {

public static void main(String[] args) {
    InputStream response = null;
    try {
        String url = "http://www.google.com";
        response = new URL(url).openStream();


        Scanner scanner = new Scanner(response);
        String responseBody = scanner.useDelimiter("\\A").next();
        System.out.println(responseBody.substring(responseBody.indexOf("<title>") + 7, responseBody.indexOf("</title>")));

    } catch (IOException ex) {
        ex.printStackTrace();
    } finally {
        try {
            response.close();
        } catch (IOException ex) {
            ex.printStackTrace();
        }
    }
}
}
Lahiru Ashan
  • 767
  • 9
  • 16
3

The problem with searching the html String is that the title tag could also be in a comment. XmlParser don't work. But there is something in the JDK from the good old Swing days:

    public static void main(String[] args) throws Exception {
        HTMLEditorKit htmlKit = new HTMLEditorKit();
        HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
        HTMLEditorKit.Parser parser = new ParserDelegator();
        parser.parse(new InputStreamReader(new URL("https://stackoverflow.com/questions/40099397/how-can-i-get-the-page-title-information-from-a-url-in-java/40099983").openStream()),
                htmlDoc.getReader(0), true);

        System.out.println(htmlDoc.getProperty("title"));
    }
Bruno Eberhard
  • 1,624
  • 16
  • 22
1

You can fetch the value of html page title using javascript and set that value to a hidden form field, later on retrieve that value from HTTP request parameter as below:

HTML Page:

<!DOCTYPE html>
<html>
<head>
<meta charset="ISO-8859-1">
<title>My page</title>
</head>
<body>
<form action="a" onsubmit="return setPageTitle()" method="post">
    <input type="hidden" name="pageTitle" id="pageTitle">
    <input type="submit" value="Go"/>
</form>
<script type="text/javascript">
   function setPageTitle(){
     document.getElementById("pageTitle").value=document.title;
    }
</script>   
</body>
</html>

Servlet Code:

String title=request.getParameter("pageTitle");
Rohit Gaikwad
  • 3,677
  • 3
  • 17
  • 40