0

I'm trying to change some HTML text using regex, I have this:

<font class="Apple-style-span" color="#0036ff">This is a text</font>

Using this regex: .replaceAll("(<font.+)\\S*color+.*?(>)","<span class=\"c2f\">");

It became: <span class="c2f">This is a text</span>(i use another regex to change )

But the problem happens when I have nested font like this:

<font class="Apple-style-span" color="#0036ff">This is a text AND <font class="Apple-style-span" color="#0036ff">This is Edited TOOO</font></font>

It became: <span class="c2f">This is Edited TOOO</span></span>(changed /font using replace too)

I can understand why it happens, but I don't know how to make it match each tag.

What I want:

<span class="c2f">This is a text AND <span class="c2f">This is Edited TOOO</span></span>

Is this possible or I need another "approach"?

rhens
  • 4,791
  • 3
  • 22
  • 38
user2582318
  • 1,607
  • 5
  • 29
  • 47
  • 2
    Have you considered using an actual HTML parser, rather than [attempting to use a regular expression](http://stackoverflow.com/a/1732454/115145)? – CommonsWare Nov 19 '15 at 17:02
  • @CommonsWare actually i do, but sometimes i will receive only some paragraphs with edited lines, but now you said, is it possible to Jsoup parse only a small part of html? i will try =x thank you – user2582318 Nov 19 '15 at 17:05
  • I have not used JSoup specifically. However, usually HTML parsers are fairly forgiving to oddly-formed HTML, since a lot of Web pages are odd. :-) – CommonsWare Nov 19 '15 at 17:08

1 Answers1

0

Just a slight change to your code. Use lazy approach.

.replaceAll("(<font.+?)\\S*color+.*?(>)","<span class=\"c2f\">");
mihirjoshi
  • 12,161
  • 7
  • 47
  • 78