I'm trying to pseudo translate the text embedded within HTML in a string. I don't want to touch the actual html tags or its attributed, just the content.
So for example, if I have something like:
<td colspan='2'><a>This is a Text in <b>Bold</b></a></td>
I want this to be eventually modified into
<td colspan='2'><a>Thìs ís à Tèxt îñ <b>Bòlð</b></a></td>
1) I can't use any third party libraries, so I'm using standard regex to parse html 2) I tried both pattern.match() and pattern.split() but both seem to have a few limitations. pattern.split() helps with splitting the string based on a regex pattern, but I lose the actual pattern in that process. Pattern.match helps with retaining the pattern, but I can't guarentee the markup.
So ideally I would want something to take the string with HTML and break it into an array like
array[0]: HTML Tag
array[1]: Plain Text
array[2]: HTML Tag
array[3]: Plain Text
array[4]: HTML Tag
array[5]: Plain Text
array[6]: HTML Tag
Any ideas ?