Regex to get from a string in java

Question

Suppose I have

<img class="size-full wp-image-10225" alt="animals" src="abc.jpg"> blah blah blah&nbsp;
<a href="http://en.wikipedia.org/wiki/Elephant">elephant is an animal</a>&nbsp;blah

I want a regex to give me the output :

blah blah blah <a href="http://en.wikipedia.org/wiki/Elephant">elephant is an animal</a> blah

without the  . I can do str.replace(" ","") separately, but how do I get the string starting from blah blah... until blah (which includes link tag).

you must remove the `img` tag seperately. Do you only need the a-Tag? That works with RegExpr. If you like to get the other text before and after the tag, here you have problems. Why you dont remove easily unneeded tags? — Adrian Preuss, Mar 28 '14 at 19:14
I do need the text before the tag as well. So basically I cant say StringUtils.removeHTMLTags() as this removes all the tags and I want the html tag. so basically what I'm thinking is to locate the first ">" before ahref and then capture the text from there till ( inclusive) — user3298846, Mar 28 '14 at 19:17
_Sees regex and HTML in title_ "http://stackoverflow.com/a/1732454/2846923." — The Guy with The Hat, Mar 28 '14 at 19:27

score 2 · Accepted Answer · answered Mar 28 '14 at 19:16

2

Maybe something like this?

^<[^>]*>\s*|&nbsp;

Java escaped:

^<[^>]*>\\s*|&nbsp;

regex101 demo

^<[^>]*>\\s* will match the first img tag and any following spaces. Then replace the  . The replacement string is "".

You might want to use a proper HTML parser though, since it'll be less likely to break.

answered Mar 28 '14 at 19:16

Jerry

70,495
13
100
144

hey thanks Jerry. I don't get the java escaped part though. So should I do something like this: str.replace(^<[^>]*>\s*| ,"") – user3298846 Mar 28 '14 at 19:21
@user3298846 Use the escaped version in Java. :) – Jerry Mar 28 '14 at 19:22

Regex to get from a string in java

1 Answers1