Regular Expression, how to validate html ul li

Question

is there a way to validate that inside html tags there is an ul with several li objects (I am using groovy). I need that in order to check that the html generated its valid in an unit test.

I would like to :

Extract the number of li elements with its data
Validate that the is inside an specified with an specified class, inside that div could be different html tags but i don´t care about them

Example text: <div class="example">...<ul><li>Element1</li><li>Element2</li><ul>...</div>

I have tried the most simplest way: <li>.+?</li> , with this I am able to extract the li elements, but I need to check the valid ul div structure.

why this is not working ? <div class='example'>.+?<ul>(<li>.+?</li>)*<ul>.+?</div> How should it be ?

Any tips?

Thanks a lot

Don't use regex to parse HTML - it is not suitable. Use a dedicated HTML parser for whatever languge/platform you are using (can't recommend any as you didn't specify). — Oded, Sep 29 '12 at 19:10
I just need to check that the list its created properly inside that div — user829882, Sep 29 '12 at 19:14
@user829882, you asked for "any tips" and one of the most respected developers on [so] gives you advice, which you promptly ignore. Do you think you might want to reconsider Oded's guidance? — Mike Pennington, Sep 30 '12 at 00:59
@MikePennington I dont need an html validator, just to validate that little html portion, its generated in a method and I want to test it, not the whole html, the links I have seen, just validate the html, not the "semantics" of the structure, I mean the ul inside that div with that class — user829882, Sep 30 '12 at 10:13
@user829882, parsing != validating. Oded told you to use an html parser. — Mike Pennington, Sep 30 '12 at 11:22
This might be a dumb question, but why do you use a unit test to test your view (=html)? You'd rather use functional or integration tests for that, which, I guess, bring a better way to test HTML than regular expressions — Raul Pinto, Oct 01 '12 at 07:50

score 4 · Accepted Answer · answered Sep 29 '12 at 19:15

4

Using an HTML scraping library such as jsoup is easier and fun than using plain regex. Since jsoup is a java library, you should be able to use it with groovy.

answered Sep 29 '12 at 19:15

pram

1,484
14
17

score 1 · Answer 2 · answered Sep 30 '12 at 01:11

1

You can parse it like a XML and count the elements:

def html = '''
  <html>
    <ul>
      <li>item 1</li>
      <li>item 2</li>
      <li>item 3</li>
      <li>item 4</li>
    </ul>
  </html>'''

def htmlNode = new XmlParser().parseText html

assert htmlNode.ul.li.size() == 4

If the html is not closing tags correctly and stuff, you can use a library like NekoHTML to normalize the HTML

answered Sep 30 '12 at 01:11

Will

14,348
1
42
44

The problem is that its a unit test, so I just want to test the portion of html generated by my method, if I would use a whole html validator it would fail. I thought that with regExp it would be easier to test that there is a div with that class, and inside that div there is an ul with the expected li elements. – user829882 Sep 30 '12 at 10:12
And what's the problem is using `XmlParser` for it? :-). I still think your test will be easier using XmlParser – Will Sep 30 '12 at 13:03
Why don't you write your unit test for the thing that creates that div instead of the entire page then? It seems like you're making this way harder than it has to be. – Justin Piper Sep 30 '12 at 16:15

score 1 · Answer 3 · answered Oct 01 '12 at 03:01

Using jsoup , consider this test (below).

Note:

It does not use regular expressions, but that is a bad idea, per other answers.
The verifyHtml() method accepts a fragment of HTML.

Example:

import groovy.util.*
import org.jsoup.*
import org.jsoup.nodes.* 
import org.jsoup.select.* 

class HtmlTester extends GroovyTestCase {
    // returns true if fragment has:
    // <div class='list'> <ul> <li> ... </li> </ul> </div>
    def verifyHtml(String htmlFragment) {
        Document doc = Jsoup.parse(htmlFragment)
        Elements divs = doc.select("div.list ul li")
        boolean result = (divs.size() > 0)

        return result
    }

    void testDivNoClass() {
        def htmlDivNoClass = "<div><ul><li>list 1</li></ul></div>"        
        assertFalse verifyHtml(htmlDivNoClass)
    }

    void testDivNoUl() {
        def htmlDivNoUl = "<div class='list'></div>"        
        assertFalse verifyHtml(htmlDivNoUl)
    }

    void testDivUlNoLi() {
        def htmlDivUlNoLi = "<div class='list'><ul></ul></div>"        
        assertFalse verifyHtml(htmlDivUlNoLi)
    }

    void testWithGoodHtml() {
        def html = """
        <div class='list'>
            <ul>
                <li>list 1</li>
                <li>list 2</li>
            </ul>
        </div>
        """    
        assertTrue verifyHtml(html)
    }    
}

Regular Expression, how to validate html ul li

3 Answers3