6

I have below content in text file

  some texting content <img  src="cid:part123" alt=""> <b> Test</b>

I read it from file and store it in String i.e inputString

   expectedString = inputString.replaceAll("\\<img.*?cid:part123.*?>",
    "NewContent");

I get expected output i.e

     some texting content NewContent <b> Test</b>

Basically if there is end of line character in between img and src like below, it does not work for example below

 <img  
          src="cid:part123" alt="">

Is there a way regex ignore end of line character in between while matching?

M Sach
  • 33,416
  • 76
  • 221
  • 314

3 Answers3

10

If you want your dot (.) to match newline also, you can use Pattern.DOTALL flag. Alternativey, in case of String.replaceAll(), you can add a (?s) at the start of the pattern, which is equivalent to this flag.

From the Pattern.DOTALL - JavaDoc : -

Dotall mode can also be enabled via the embedded flag expression (?s). (The s is a mnemonic for "single-line" mode, which is what this is called in Perl.)

So, you can modify your pattern like this: -

expectedStr = inputString.replaceAll("(?s)<img.*?cid:part123.*?>", "Content");

NOTE: - You don't need to escape your angular bracket(<).

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • Hey Rohit can you help me for http://stackoverflow.com/questions/13865750/why-this-regex-not-giving-expected-output. Its a question related to above one only but somehow not getting expected outcome when i have two img tags. See if you can help. Thanks in advance – M Sach Dec 13 '12 at 18:12
3

By default, the . character will not match newline characters. You can enable this behavior by specifying the Pattern.DOTALL flag. In String.replaceAll(), you do this by attaching a (?s) to the front of your pattern:

expectedString = inputString.replaceAll("(?s)\\<img.*?cid:part123.*?>", 
    "NewContent");

See also Pattern.DOTALL with String.replaceAll

Community
  • 1
  • 1
lc.
  • 113,939
  • 20
  • 158
  • 187
1

You need to use Pattern.DOTALL mode.

replaceAll() doesn't take mode flags as a separate argument, but you can enable them in the expression as follows:

expectedString = inputString.replaceAll("(?s)\\<img.*?cid:part123.*?>", ...);

Note, however, that it's not a good idea to parse HTML with regular expressions. It would be better to use HTML parser instead.

axtavt
  • 239,438
  • 41
  • 511
  • 482