My Java class will receive a String object that could be json, html, or plain text. I need to be able to detect which type from the Java String object.
Apache Tika does this, but only detects the type from a File object. When I pass it a String object it returns "application/octet-stream" as the type (for all types), which is incorrect.
Until now, we have only had to detect whether the String was html or plain text. In the code sample provided, we only had to search for obvious html tags. Now, we need to scan the String and figure out whether it is html, json, or plain text.
I would love to use a third-party library if one exists that can detect the type from a String object.
public static final String[] HTML_STARTS = {
"<html>",
"<!--",
"<!DOCTYPE",
"<?xml",
"<body"
};