0

i need to compile a pattern in order to have the name of the span but in vain i can't have what i want may be it's the way that this pattern is written i know that i made something not totaly correct .. this is my html code :

<span class="libelleAttributPageParametrage"> 
 "Libellé de facturation"
<font color="#C60307">*</font>
</span>

and this is my java code :

 public List<String> getAllSpan()
{
    String HTMLSource = priceSelenium.getHtmlSource();
    priceSelenium.getBodyText();
    List<String> ListOfSpan = new ArrayList<String>();
    Pattern p = Pattern.compile( "<SPAN[^>]*>([\\w\\d\\s\\n\\r()/°@\\.\\-àáâãäåçèéêëìíîïðòóôõöùúûüýÿ]*)</SPAN>" );
    Matcher m = p.matcher( HTMLSource );
    while ( m.find() )
    {
        if ( !m.group( 1 ).isEmpty() )
        {
            ListOfSpan.add( m.group( 1 ) );
        }
    }
    return ListOfSpan;
}

and what i need to have in my ListOfSpan is :"Libellé de facturation"

thanks in advance

asmae
  • 31
  • 1
  • 1
  • 6
  • 1
    Please refrain from parsing HTML with RegEx as it will [drive you į̷̷͚̤̤̖̱̦͍͗̒̈̅̄̎n̨͖͓̹͍͎͔͈̝̲͐ͪ͛̃̄͛ṣ̷̵̞̦ͤ̅̉̋ͪ͑͛ͥ͜a̷̘͖̮͔͎͛̇̏̒͆̆͘n͇͔̤̼͙̩͖̭ͤ͋̉͌͟eͥ͒‌​͆ͧͨ̽͞҉̹͍̳̻͢](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an HTML parser instead – HashimR Aug 02 '12 at 08:31

1 Answers1

1

If you want to parse HTML, you should use an HTML parser library (such as jsoup). This will give you an object graph representing the HTML, with which you can navigate to the <span> object you're interested in and call something like spanElem.attr("name").

HTML is not a regular language, and so treating it as text and trying to extract parts with regexes is not strictly possible. It might work for a while in simple cases, but it's still likely to involve an overly complex regex, which will fail for some valid HTML. That way lies madness.

Community
  • 1
  • 1
Andrzej Doyle
  • 102,507
  • 33
  • 189
  • 228