0

I’m tring to extract the font face name e.g.:

String htmlContent = "<font face=\"impact\">Hdjdjdisid <font style=\"background-color:#ff0000\"> shejej</font></font>";

to:

impact

This is what I found on the web but it’s returning all the tags’ content and i want only the face name.

String pattern = "<FONT (.*?)>";

Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(htmlContent);

if (m.find()) {
    // prints: <FONT FACE="Verdana" SIZE="12"> My Name is xyz </FONT></P>
    System.out.println(m.group());

    // prints: FACE="Verdana" SIZE="12"
    System.out.println(m.group(1));
}

How can I extract only the face name?

Sebastian Simon
  • 18,263
  • 7
  • 55
  • 75
matt matt
  • 205
  • 2
  • 11
  • 3
    Why don't you use an HTML parser such as jsoup? – fge Jul 15 '15 at 09:01
  • 1
    *[Even Jon Skeet cannot parse HTML using regular expressions.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)* – Selvin Jul 15 '15 at 09:02
  • jsoup extract elements http://stackoverflow.com/questions/19831558/jsoup-how-to-extract-every-elements – SatyaTNV Jul 15 '15 at 09:09

1 Answers1

2

In this simple case, adjust your pattern like this:

<font[^>]+face="([^"]+)"

escaped for use with java:

String pattern = "<font[^>]+face=\"([^\"]+)\"";

But as others pointed out: dont parse html with regex.

f1sh
  • 11,489
  • 3
  • 25
  • 51