0

My Java class will receive a String object that could be json, html, or plain text. I need to be able to detect which type from the Java String object.

Apache Tika does this, but only detects the type from a File object. When I pass it a String object it returns "application/octet-stream" as the type (for all types), which is incorrect.

Until now, we have only had to detect whether the String was html or plain text. In the code sample provided, we only had to search for obvious html tags. Now, we need to scan the String and figure out whether it is html, json, or plain text.

I would love to use a third-party library if one exists that can detect the type from a String object.

public static final String[] HTML_STARTS = {
    "<html>",
    "<!--",
    "<!DOCTYPE",
    "<?xml",
    "<body"
};
Shane
  • 11
  • 4

2 Answers2

0
public static boolean isJSON(String str)
{
    str = str.trim();
    if(str[0] == '{' && str[str.length-1] == '}') {
        return true;
    }

    return false;
}


public static boolean isHTML(String str)
{
    List<String> htmlTags = Arrays.asList(
                                "<html>",
                                "<!--",
                                "<!DOCTYPE",
                                "<?xml",
                                "<body"
                            );

    return htmlTags.stream().anyMatch(string::contains);
}

public static int IS_PLAIN = 0;
public static int IS_HTML = 1;
public static int IS_JSON = 2;

public static int getType(String str)
{
    if(isJSON(str)) return IS_JSON;
    else if(isHTML(str)) return IS_HTML;
    else return IS_PLAIN;
}
Stijn Leenknegt
  • 1,317
  • 4
  • 12
  • 22
  • 1
    Very simplistic... some code may or may not accept JSON without an enclosing object. Your HTML detection will also classify any text that happens to contain one of these tags as HTML, even if it's embedded in JSON or happens to appear in plain text. Lastly, you just dump the code here, without any explanation. New programmers can just copy&paste it, but learning is limited. You could improve the post by explaining your code. – Robert Mar 27 '19 at 22:00
0

You can use JSoup for parsing HTML and Jackson or Gson for JSON.

Dan Forbes
  • 2,734
  • 3
  • 30
  • 60