-1

How can I check whether the user entered a String or a String with HTML format in Java without using regular expression ?

If this is a case :

 String str = "Jack is sleeping";
 String HtmlString = "<html><head></head><body>Jack is jumping</body></html>";
Stefan Endrullis
  • 4,150
  • 2
  • 32
  • 45
prabh
  • 1
  • 3
    What is a html string? Does it have to be well formed with correct html syntax? is `test` a html string? – assylias Mar 20 '19 at 18:34
  • 1
    Perhaps you could just check whether the string starts with `` and ends with ``? Or do you need to check that it's proper HTML all the way through? – Dawood ibn Kareem Mar 20 '19 at 18:37
  • Related https://stackoverflow.com/a/3154281/2804966 – lalo Mar 20 '19 at 18:41
  • 1
    You could just use a parser to parse the string as HTML and if that fails then just treat it as a normal string. – xtratic Mar 20 '19 at 18:46

2 Answers2

0

Just as a proof of concept. If you want to check whether a String contains valid HTML or not, try parsing it

DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();

String a = "Jack is sleeping";
String htmlString = "<html><head></head><body>Jack is jumping</body></html>";

is.setCharacterStream(new StringReader(htmlString));
documentBuilder.parse(is);

System.out.println(format("\"%s\" is a valid HTML string", htmlString));

try {
    is.setCharacterStream(new StringReader(a));
    documentBuilder.parse(is);
} catch (SAXParseException spe) {
    System.out.println(format("\"%s\" is a NOT a valid HTML string", a));
}

The HtmlString in your example isn't valid, it's missing the final > so that is fixed in the above example.

Misantorp
  • 2,606
  • 1
  • 10
  • 18
0

You could check the string to see if it contains substrings that look like HTML tags:

// Check if a string contains HTML-like '<[/]abc[/]>' substrings
public static boolean containsHtmlTags(String s)
{
    boolean  hasTags = false;
    int      sLen = s.length();
    int      p = 0;

    // Look for '<[/]abc[/]>' substrings    
    while (p < sLen)
    {
        // Check for the next '<[/]abc[/]>' substring
        boolean hasTag = false;
        p = s.indexOf('<', p);
        if (p < 0)
            break;
        p++;
        if (p < sLen  &&  s.charAt(p) == '/')
            p++;
        while (p < sLen)
        {
            char ch = s.charAt(p);
            if (!Character.isLetter(ch))
                break;
            hasTag = true;
            p++;
        }
        if (p < sLen  &&  s.charAt(p) == '/')
            p++;
        if (p >= sLen  ||  s.charAt(p) != '>')
            hasTag = false;
        p++;
        hasTags = (hasTags || hasTag);
    }

    // True if s contains one or more '<[/]abc[/]>' substrings
    return hasTags;
}

This is not perfect, but it looks for substrings within a string that look like HTML element tags like <foo>, </foo>, or <foo/>. If the string contains at least such substring one, then the method returns true.

Note that this is a very simple scanner; it does not check for HTML attributes or spaces within tags, or matching opening and closing tag names. For that level of sophistication, you would be better off just using regular expressions or an HTML parser.

David R Tribble
  • 11,918
  • 5
  • 42
  • 52