0

I'm just getting started with regexes so i might be doing some dumb mistakes here. I've got the following regex:

String regex = "length ?= ?\"[\\+-]?\\d+(\\.\\d+)?\".*height ?= ?\"[\\+-]\\d+(\\.\\d+)?\"";

This is matching this file/string properly:

<house length="120" prize="2000000" height="28"/>

but it's not matching

<house prize="2000000"
       length="-1200"
       owner="Smith"
       height="55.8"/>

I find this really weird since it should be matching the second one too... Any help pointing me into the right direction will be much appreciated!

ParkerHalo
  • 4,341
  • 9
  • 29
  • 51

3 Answers3

1

You'll either need to account for the newlines and spaces using a character class (eg. [\s\n]) within your pattern or use Pattern.MULTILINE (?m) or Pattern.DOTALL (?s).

Community
  • 1
  • 1
l'L'l
  • 44,951
  • 10
  • 95
  • 146
  • The multiline isn't going to help him here. – naurel Nov 30 '15 at 09:54
  • 1
    @naurel: some explanation might be recommended. – l'L'l Nov 30 '15 at 09:55
  • He's trying to match a pattern ON multilines with a `.*`. The multiline modifier cause `^` and `$` to match the begin and the end of the line. It's not the question here. Using this method would complicate the regex far more. – naurel Nov 30 '15 at 10:01
  • @naurel: They are not using `^` `$` anchors ... and multiline mode can be used: https://regex101.com/r/yS9zH9/1. Every situation calls for different solutions, and knowing what the options are is important. – l'L'l Nov 30 '15 at 10:05
  • You're not matching the same thing as he is in your example. If you're able to do the same thing as he wants with multiline I would like to learn it. – naurel Nov 30 '15 at 10:14
  • Try it on the site that's linked... that's what it's there for :) – l'L'l Nov 30 '15 at 10:16
1

this seems to work:

String regex="(.|\\s)* length ?= ?\"[\\+-]?\\d+(\\.\\d+)?\"(.|\\s)*height ?= ?\"[\\+-]?\\d+(\\.\\d+)?\""; 

I replace ., by (.|\s) , added it at beginning, and replace [\+-]? at right, after height

0

Your problem is that .* match any character except newline.

You need to add the s modifier. You can find examples here.

Plus you forgot a ? after your second [\\+-] case.

length ?= ?\"[\\+-]?\\d+(\\.\\d+)?\".*height ?= ?\"[\\+-]?\\d+(\\.\\d+)?\"

Example

In java it will give you :

Pattern.compile("length ?= ?\"[\\+-]?\\d+(\\.\\d+)?\".*height ?= ?\"[\\+-]?\\d+(\\.\\d+)?\"", Pattern.DOTALL);

Remember that String.matches() doesn't work with flags but you can use (?s) in front of your string to make it work anyway.

naurel
  • 625
  • 4
  • 18
  • ah ok, so the problem is that there are newlines between my attributes and `.` doesn't match newlines! As i see from the docs there are 2 possibilities to apply the `s` modifier. 1. writing `(?s)` in front of my string and 2. calling the compile method as `Pattern.compile(someStr, Pattern.DOTALL);` Is any of those ways recommended more than the other? – ParkerHalo Nov 30 '15 at 10:03
  • I would suggest you to use `Pattern.compile(someStr, Pattern.DOTALL);` since matching modes are supposed to be specified outside the regular expression. But I don't know if there is any real impact. – naurel Nov 30 '15 at 10:11
  • Thanks! This really helped me! (And i feel like a derp right now ;)) – ParkerHalo Nov 30 '15 at 10:12