0

I'm trying to use regex in a particular scenario as explained below:

There are many HTML pages, each containing number of <img src> tags having dynamic values:

Tag1 = <p>Para1 <img src="/A/images/b.txt">Some text</p>   
Tag2 = <p>Para2 <img src="/A/B/images/c.jpeg">Some text</p>  
Tag3 = <p>Para3 <img src="/../images/H/e.png">Some text</p> 
Tag4 = <p>Para4 <img src="/../D/images/G/J/f.gif">Some text</p>

We target the pattern "/<anything>/images/. What we need after replacement is

Tag1 = <p>Para1 <img src="/library/MYFOLDER/location/b.txt">Some text</p>
Tag2 = <p>Para2<img src="/library/MYFOLDER/location/c.jpeg">Some text</p>
Tag3 = <p>Para3<img src="/library/MYFOLDER/location/H/e.png">Some text</p>
Tag4 = <p>Para4<img src="/library/MYFOLDER/location/G/J/f.gif">Some text</p>

What's actually happening is very different.The pattern is eating up everything after /images and giving us

Tag1 = <p>Para1 <img src="/library/MYFOLDER/locationp>
Tag2 = <p>Para2<img src="/library/MYFOLDER/locationp>
Tag3 = <p>Para3<img src="/library/MYFOLDER/locationp>
Tag4 = <p>Para4<img src="/library/MYFOLDER/locationp>

Here is the regex pattern I'm using

"{1}(.){1,}[/images/]{1}<br>

Here is the code:

String subStringTem = "<p><strong>Control Steps:</strong> <img src=\"../images/retain_image.gif\" width=\"20\" > Description.</p>";
String newImagPath = "\"/library/MYFOLDER/location";
final Pattern p = Pattern.compile("\"{1}(.){1,}[/images/]{1}");
final Matcher m = p.matcher(subStringTem);
String result = m.replaceAll(newImagPath);
System.out.println(result);

Expected Result:

<p><strong>Control Steps:</strong> <img src="/library/MYFOLDER/location/retain_image.gif\" width=\"20\" > Description.</p> 

Actual Result:

<p><strong>Control Steps:</strong> <img src="/library/MYFOLDER/locationp>
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Adi
  • 123
  • 1
  • 13
  • `[/images/]` in a regex matches **one** character that is either `/`, `i`, `m`, `a`, `g`, `e`, or `s` (the last / is redundant). If you want to match the sequence, remove the square brackets. Also, `{1}` is never needed in a regex, and `{1,}` is more concisely represented as `+`. I think you should look over a tutorial on regexes, like [this one](http://docs.oracle.com/javase/tutorial/essential/regex/). – ajb Oct 08 '14 at 23:12
  • [__DO NOT PARSE XML WITH REGEX.__](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Qix - MONICA WAS MISTREATED Oct 08 '14 at 23:25

2 Answers2

3

The biggest mistake in your regex is using square brackets. In a regex, [abc] matches one character that is either a, b, or c; it does not match the substring "abc". So [/images/] does not do what you think it does. Remove the square brackets.

What actually happens with your regex:

"{1}(.){1,}[/images/]{1}

It will match a quote character, followed by 1 or more occurrences of any character, followed by one of the characters /, i, m, a, g, e, s. (The last / will be ignored since you already have one in the set.) Also, when you tell it to match one or more occurrences of any character, by default it does a greedy match, matching as many characters as possible. Therefore, it will stop at the furthest character in square brackets, instead of the nearest one; and the furthest character is the / in </p>.

Try this regex instead:

".+?/images/

You never need to tell a regex to match exactly one occurrence with {1}; it does that for you automatically. + is a shorthand for {1,}. ? tells the regex to match the fewest number of characters, instead of the greatest number it can. Then it will look for the nearest /images/ substring.

ajb
  • 31,309
  • 3
  • 58
  • 84
  • Really Thank you ajb for the great explanation. I'm very new to the regex and was in parallel going through tutorials but the client timeline is very stringent. I assure I'll go through all the tutorials and will improve my skills in this. Thank you again! – Adi Oct 08 '14 at 23:27
0

If all of the location you want to replace is actually always the same, ie lets say you want to replace assets/images/somefolder/a.png with img/a.png, you can very easily just use the replace method on the string you have instead, so in your case something with substring perhaps?

If this is near as simple as it looks, using regex is severe overkill. Try something like this

String src = "/A/images/b.txt";
String othersrc = "/library/MYFOLDER/";
//remove everything from before /images/ and replace it with your path
src = othersrc + src.substring(src.lastIndexOf("/images/") + 1, src.length());
System.out.println(src);

Result:
/library/MYFOLDER/images/b.txt

Zachary Craig
  • 2,192
  • 4
  • 23
  • 34
  • Thank you for the reply zack but we don't want to remove everything before "/images". The read String contain complete line having "src" as a portion which contain "/images". Please see the example I explained above. – Adi Oct 08 '14 at 23:17