0

I have below message String. I want to replace all the image tag which contains the occurence of sequence i.e ?custId=1234 with new string cid:

 String message = "Need to process  image tag <img src=\"http://danny.oz.au/p/56214815-tripod.jpg?custId=1234\"/>";

This what i tried after going thru bit of regex tutorial which replaces all image tag occurence with cid:. I am not getting how to fit the one more filter i.e ?custId=1234 in regex so that replace only those image tags that contains ?custId=1234

  message = message.replaceAll("\\<img.*?>", "cid:");

EDIT:- For example if input is

  "Need to process  image tag <img src=\"http://danny.oz.au/p/56214815-tripod.jpg?custId=1234\"/>";

output should be
"Need to process image tag cid:";

becoz input contains img tag and ?custId=1234 both

input is

     "Need to process  image tag <img src=\"http://danny.oz.au/p/56214815-tripod.jpg?custId=1235\"/>";

output should be

     "Need to process  image tag <img src=\"http://danny.oz.au/p/56214815-tripod.jpg?custId=1235\"/>";

becoz input does not contain ?custId=1234 both

M Sach
  • 33,416
  • 76
  • 221
  • 314

3 Answers3

2

Try this: -

message = message.replaceAll("<img.*?\\?custId=1234.*?>", "cid:");

For your given input string: -

"Need to process  image tag <img src=\"http://danny.oz.au/p/56214815-tripod.jpg?"
+ "custId=1234\"/>"

this will give you: -

"Need to process  image tag cid:"

Also for input: -

"Need to process  image tag <img src=\"http://danny.oz.au/p/56214815-tripod.jpg?custId=1235\"/>"

OUTPUT: -

"Need to process  image tag <img src=\"http://danny.oz.au/p/56214815-tripod.jpg?custId=1235\"/>"

Also, I would suggest you to take a look at Jsoup - Java HTML Parser, which you should use to parse your HTML. Regex is not a good idea to parse HTML. You can only parse a limited range of tags.

You can also use HTML Cleaner


UPDATE: -

If you want your dot (.) to match newline also, you can use PAttern.DOTALL flag. Alternativey, in case of String.replaceAll(), you can add a (?s) at the start of the pattern, which is equivalent to this flag.

From the Pattern.DOTALL - JavaDoc : -

Dotall mode can also be enabled via the embedded flag expression (?s). (The s is a mnemonic for "single-line" mode, which is what this is called in Perl.)

So, you can modify your pattern like this: -

message = message.replaceAll("(?s)<img.*?\\?custId=1234.*?>", "cid:");
Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • @MSach. You have a wide range of `HTML` tags, with wide range of attributes. By using regex, you cannot parse and get information out of all of them. Also, different tags behave differently, and can be used in a different way. So, you should always use tools, that are specifically made for this purpose. Use a HTML Parser. – Rohit Jain Oct 26 '12 at 11:24
  • @MSach. Regex are for normal text processing, and not for somthing that follow some order. For E.g, it is very weak at taking care of say `opening` and corresponding `closing` brackets. – Rohit Jain Oct 26 '12 at 11:25
  • @MSach. I hope you got it. You can take a look at the two links I have provided. :) – Rohit Jain Oct 26 '12 at 11:27
  • Rohit one more favour. Can You explain the metadat characters you used in between img and custId i.e img.*?\\?custId. how this work. I tried to get their meaning at http://www.vogella.com/articles/JavaRegularExpressions/article.html but could not make out why we used .*?\\? between img and custId – M Sach Oct 26 '12 at 12:07
  • 1
    `.*?` means match everything `reluctant`. `?` is used for reluctant matching. means it will match the least character before the next pattern is matched. Then we need to match the `?` of `?custId`. For that we need to escape `?` with backslash: - `\\?` because it has special meaning. – Rohit Jain Oct 26 '12 at 12:09
  • For E.G: - In the string: - `My name is Rohit. Hello Rohit`. `.*Rohit` will match every character till the last `Rohit`. Whereas, `.*?Rohit` will only match every characters till the first `Rohit` – Rohit Jain Oct 26 '12 at 12:10
  • For more on quantifiers: - See [`tutorial`](http://docs.oracle.com/javase/tutorial/essential/regex/quant.html) and for more on escape characters: - See [`Pattern Docs`](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) – Rohit Jain Oct 26 '12 at 12:16
  • @MSach. Thanks. Glad that I could help you :) – Rohit Jain Oct 27 '12 at 09:17
  • Rohit one more thing on same regex. I want to check whether any given string contains above regex or not. java string does not have any method contains(Regex regex) . I tried Pattern p = Pattern.compile("\\", Pattern.CASE_INSENSITIVE); String messageBody2 = "

    Test Customer

    \"\"

    "; boolean test = p.matcher(messageBody2).matches(); but test returns false it should be true
    – M Sach Nov 01 '12 at 13:33
  • 1
    @MSach. You don't need to escape `&`. Just try: - `Pattern.compile("\\");` – Rohit Jain Nov 01 '12 at 17:56
  • Rohit found one issue with regex "\\". here it is if img tag is constructed in a way so that src="someURL" starts in next line of '" does not work.To test out this created the text file with content where src starting in next line of img. Then i read the file with fileReader in to string say contentStr. Then i did contentStr.replaceAll("\\", "abcd"). It does not work "\\" ignore end of line character – M Sach Nov 06 '12 at 09:40
  • But as soon as i remove the end of line character betwwn img and src it works. Is there a way regex "\\" ignore end of line character in between. – M Sach Nov 06 '12 at 09:42
  • 1
    @MSach.. And you don't need to escape your `<`. I think I mistyped it previously. I have edited that part. You can see the change. – Rohit Jain Nov 06 '12 at 09:54
  • Thanks a lot Rohit it worked. You deserve separate answer accepted for this. Posted a new question at http://stackoverflow.com/questions/13249099/ignoring-the-line-break-in-regex. Please post answer there so that i can accept it – M Sach Nov 06 '12 at 10:33
  • @MSach.. haha :) You're welcome. But that was not needed, as your problem is already solved. Still I have added it there as an answer. You can only accept it after 15 minutes. :) – Rohit Jain Nov 06 '12 at 10:39
2

You could use ths String Contains function to first filter for those that have "custid=". e.g.

if (message.contains("custId=1234") {
    message = message.replaceAll("\\<img.*?>", "cid:");
}
Kate
  • 1,556
  • 1
  • 16
  • 33
  • What if message contains both - a tag with `custId=1234` and one without it? – Rohit Jain Oct 26 '12 at 11:14
  • I didn't get the impression from the example that this was a possability but certainly if it was then my suggestion would need refining to first extract the image tag as a substring and then just perform the test for custId on that string. – Kate Oct 26 '12 at 11:21
0

I think I have got you Demo

"Need to process image tag.*\?custId=(\d+)"

Here you have a regex that backreferences you your id, then you can print Need to procces the cId: match.group(1) or whatever you need to.

Javier Diaz
  • 1,791
  • 1
  • 17
  • 25