-3

I am trying to remove dtd tag from all the xml documents in a directory and I am trying to remove it using regex.

The following expression I am using to remove it.

<!DOCTYPE[^>[]*(\[[^]]*\])?>

But, I am getting the below error

java.util.regex.PatternSyntaxException: Unclosed character class near index 27
<!DOCTYPE[^>[]*(\[[^]]*\])?>
                           ^

Could someone let me know the Java equivalent regex for the above string.

BarathVutukuri
  • 1,265
  • 11
  • 23

1 Answers1

1

In Java you need to escape special characters with double back-slashes.
Try this:

final String regex = "<!DOCTYPE[^>\\[]*(\\[[^\\]]*\\])?>";

Here is DEMO

MaxZoom
  • 7,619
  • 5
  • 28
  • 44
  • That is still giving the same error. Unclosed character class near index 27 []*(\[[^]]*\])?> – BarathVutukuri Apr 21 '17 at 13:57
  • 1
    Of course `[^]]` is not correct. In **`java.util.regex` pattern**, you must escape the `[` and `]` inside a character class. You do not have to escape a `]` outside a character class. – Wiktor Stribiżew Apr 21 '17 at 13:58
  • Inside a character-GROUP you don't need to escape. You are just missing one "closing" Bracket, that finishes the Group you start with `DOCTYPE[` – dognose Apr 21 '17 at 14:00
  • Not exactly. A backslash in a java string literal must be escaped by another backslash. There are no "special characters"; there is only one. Escaping is distinct from *escape sequences* like `\n` for newline etc. – Bohemian Apr 21 '17 at 14:00
  • @dognose , seems like I have closed all the brackets properly.. Pardon if I am wrong. – BarathVutukuri Apr 21 '17 at 14:10
  • It seems the second closing `]` should be escaped as well – MaxZoom Apr 21 '17 at 14:30
  • @MaxZoom Thanks, Your Demo worked. Saved my life. Thanks :) – BarathVutukuri Apr 21 '17 at 14:47