Matcher gives different results on Ubuntu vs. Windows

Question

I'm running the exact same eclipse project on Ubuntu and on Windows but getting different output.

The unevenly behavior occurs in the following code:

String regex = "<token id=\"(.*)\">.*\n.*<word>(.*)</word>.*\n.*<lemma>(.*)</lemma>.*\n.*\n.*\n.*<POS>(.*)</POS>";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(fileAsString);
while (matcher.find()) {
    ...
}

The (matcher.find()) check return false on Windows but true on Ubuntu (which is the expected behavior).

Eclipse Juno and jdk7 on both.

Maybe it's not related to the operating system, but that's the only different I found after debug parallelly and after check the project's properties in the two environments..

Any idea to the differences???

I assume this might help you: http://stackoverflow.com/questions/207947/java-how-do-i-get-a-platform-independent-new-line-character — MByD, Jan 10 '13 at 20:50
Could you specify which JDK is being used on each machine? Are you using openjdk on Ubuntu? — Eric Wilson, Jan 10 '13 at 20:50

score 4 · Accepted Answer · answered Jan 10 '13 at 20:51

You're matching \n, which is the line ending for Linux, but not Windows (you need \r\n for Windows). Something like \r?\n would fix your specific problem.

That said, you should never parse anything HTML-like (including XML) with regex. You're missing out on everything XML is about, not the least of which its flexibility with hand-written "mistakes" like different order of tags, spaces etc.

score 1 · Answer 2 · answered Jan 10 '13 at 20:50

1

It might be a difference in end of line characters. Try adding an optional \r to the regex.

answered Jan 10 '13 at 20:50

WW.

23,793
13
94
121

score 1 · Answer 3 · answered Jan 10 '13 at 20:51

1

Very probably because of the line endings. The dot does not match line endings by default, and you explicitly look for \n in your regex.

Try and compile your pattern with Pattern.DOTALL, or put \r?\n everywhere you have \n in the regex.

answered Jan 10 '13 at 20:51

fge

119,121
33
254
329

Matcher gives different results on Ubuntu vs. Windows

3 Answers3