1

I have the following regex in = in.replaceAll(" d+\n", "");

I wanted to use it to get rid of the "d" at the end of lines

But I just won't do that d
<i>I just won't do that</i> d

No, no-no-no, no, no d

What is not accurate with my regex in = in.replaceAll(" d+\n", "");

Pshemo
  • 122,468
  • 25
  • 185
  • 269
user3191304
  • 221
  • 3
  • 8

2 Answers2

5

Most probably your lines are not separated only with \n but with \r\n. You can try with \r?\n to optionally add \r before \n. Lets also not forget about last b which doesn't have any line separators after it. To handle it you need to add $ in your regex which means anchor representing end of your data. So your final pattern could look like

in.replaceAll(" d+(\r?\n|$)", "")

In case you don't want to remove these line separators you can use "end of line anchor" $ with MULTILINE flag (?m) instead of line separators like

in.replaceAll("(?m) d+$", "")

especially because there are no line separators after last b.


In Java, when MULTILINE flag is specified, $ will match the empty string:

  • Before a line terminator:
    • A carriage-return character followed immediately by a newline character ("\r\n")
    • Newline (line feed) character ('\n') without carriage-return ('\r') right in front
    • Standalone carriage-return character ('\r')
    • Next-line character ('\u0085')
    • Line-separator character ('\u2028')
    • Paragraph-separator character ('\u2029')
  • At the end of the string

When UNIX_LINES flag is specified along with MULTILINE flag, $ will match the empty string right before a newline ('\n') or at the end of the string.


Anyway if it is possible don't use regex with HTML.

Community
  • 1
  • 1
Pshemo
  • 122,468
  • 25
  • 185
  • 269
2

As Pshemo states in his answer, your string most likely contains Windows-style newline characters, which are \r\n as opposed to just \n.

You can modify your regex to account for both newline character (plus the case where the string ends with a d without a newline) with the code:

in = in.replaceAll("(d+(?=\r\n)|d+(?=\n)|d+$)","");

This regex will remove anything that matches d+ followed by \r\n, d+ followed by \n or d+$ (any d before the end of the String).

(d+(?=\r\n)|d+(?=\n)|d+$)

Regular expression visualization

Debuggex Demo

Community
  • 1
  • 1
Taylor Hx
  • 2,815
  • 23
  • 36
  • You can get rid of the first two branches in your regex; `$` covers all the bases. – Alan Moore Jan 15 '14 at 03:30
  • @AlanMoore Unfortunately, not when dealing with `String::replaceAll(regex, replacement)`. `$` will only match the end of the String, not the end of a line. – Taylor Hx Jan 15 '14 at 03:35
  • @Daemon: `$` should cover all cases, when multiline flag is used `(?m)`. `$` will match right before `\r\n` or `\r` or `\n` (match before `\n` only if no preceding `\r`). – nhahtdh Jan 15 '14 at 05:23
  • Doh! Yes, I was assuming multiline mode, but I forgot to mention it. – Alan Moore Jan 15 '14 at 20:35