Finding repeating
's between a certain

Question

I have the following HTML string:

<span class='together'>line one,<br><span class='indent'>line two.</span><br>Line three,<br><span class='indent'>line four,<br>line five,<br>line six,<br>line seven;<br>line eight.<br>Line nine;<br>line ten,<br>line eleven,<br>line twelve.</span><br>Line thriteen,<br><span class='indent'>line fourteen,<br>line fifteen,<br>line sixteen,<br>line seventeen,<br>line eighteen.</span></span>

I am trying to find a regex expression that will find all the  's that are between the  and it's closing . The  encapsulates the whole sting and should just be ignored.

At the moment the best I can do is: .*?( ).*?<\/span> which doesn't work at all. The first   this grabs is outside of the  and then it skips over a bunch of other  's that I want (See here).

Is this possible? Should I instead use (.*?)\<\/span> and then parse the captured group later?

As you can tell my regex knowledge is pretty limited.

[**DON'T PARSE HTML WITH REGEX**](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) — Jonathon Reinhart, Nov 08 '15 at 09:01
@hjpotter92 this in Java for an android app. I'm getting this content from a JSON dump and then trying to put the (formatted) string in a TextView. The Html.fromHtml ignores spans and I don't think you can add your own styles to a textview to pick up spans. — Ampers, Nov 08 '15 at 09:09

bobble bubble · Accepted Answer · 2015-11-08T12:41:07.620

1

In comments of other answer you wrote

The content between the spans will only have a   tag in it and no other HTML...

If there are only   tags / no other tags before  try with a lookbehind. There's only finite repitition allowed so need to set a limit to what max length inside the span could be.

(?s)(?<=<span class='indent'>(?:(?!</?span).){0,9999}?)<br>

Just picked 9999, you might need higher value depending on input. Demo at regexplanet (click Java). (?!</?span). The negative lookahead is used to not skip a span when looking behind.

This only works for data like your sample and not with any nested spans. Use parser in this case.

edited Nov 08 '15 at 12:41

answered Nov 08 '15 at 10:14

bobble bubble

16,888
3
27
46

Thanks for your work bobble bubble. I'm marking this as the correct answer as it does do what I asked. However a parser might be the "correct" way to solve my issue. In fact I ended up using regex to find the contents of my indent span and then did a some simple finding and replacing dealing with the
's – Ampers Nov 09 '15 at 02:02
1

You're welcome @Ampers, thank you! Sounds like you found the optimal way to deal with it. Well if parser or regex - I think it depends on the problem and if parsing arbitrary html or your own. – bobble bubble Nov 09 '15 at 03:09

Finding repeating 's between a certain

1 Answers1

Finding repeating
's between a certain