0

I am having a string "<?xml version=2.0><rss>Feed</rss>" I wrote a regex to match this string as

"<?xml.*<rss.*</rss>" 

But if the input string contains \n like `"\nFeed" doesn't work for the above regex.

How to modify my regex to include \n character between strings.

Achaius
  • 5,904
  • 21
  • 65
  • 122
  • It all depends on where you want to catch `\n`. After ` – asteri Jun 20 '14 at 13:15
  • 3
    Please, to keep us all sane and safe, don't parse XML with regular expressions. See http://stackoverflow.com/a/1732454/1907906 –  Jun 20 '14 at 13:17
  • 2
    You need to use the `s` (*DOTALL*) modifier forcing the `.` to match newline sequences. – hwnd Jun 20 '14 at 13:17
  • You can use a [DOTALL option](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#DOTALL) to let `.` also capture `\r` and `\n`: `"(?s)...."`. Mind that a Windows line break (CR+LF), "\r\n" would not work too. – Joop Eggen Jun 20 '14 at 13:20
  • Try `<\?xml.*`. escape `?` and '/' by using escape character `\\`. – Braj Jun 20 '14 at 13:32

2 Answers2

1

The matching behavior of a dot can be controlled with a flag. It looks like in Java the default matching behavior for the dot is any character except the line terminators \r and \n.

I'm not a Java programmer, but usually using (?s) at beginning of a search string changes the matching behavior for a dot to any character including line terminators. So perhaps "(?s)<?xml.*<rss.*</rss>" works.

But better would be here to use "<?xml.*?<rss[\s\S]*?</rss>" as search string.

\s matches any whitespace character which includes line terminators and \S matches any non whitespace character. Both in square brackets results in matching any character.

For completness: [\w\W] matches also always any character.

Mofi
  • 46,139
  • 17
  • 80
  • 143
1

You can combine it with (\\n)*. It is necessary to add an extra \ because it is a special character. Another option is to execute replaceAll("\\n","") before executing the regex.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
rafaborrego
  • 610
  • 1
  • 8
  • 19
  • 1
    I added code formatting so we can read the answer as you intended it, but it's still a bad answer. For one thing, `\n` is not the only character that `.` doesn't match. – Alan Moore Jun 20 '14 at 14:27
  • 1
    Thank you for editing it, I published it with my mobile and I didn't remember the formatting character. And yes, you are right, those options are not good solutions, just workarounds for solving the mentioned `\n` problem – rafaborrego Jun 20 '14 at 17:11