1

I am struggling to get the following regular expression (in Java) to work nicely. I want to see if a string has a year, and the strings can be

Mar 3, 2014 

or sometimes with a closing parenthesis such as

Mar 3, 2014)

I am using

text.matches("\\b((19|20)\\d{2})(\\)?)\\b")

which works in most cases, but does not match if string ends at the parenthesis If I use

text.matches("\\b((19|20)\\d{2})(\\)?)$") 

it matches text that ends after the parenthesis but not a string that has another space

I thought that \b would include end of string, but cannot get it to work.

I know I can use two regex's but that seems really ugly.

Luiggi Mendoza
  • 85,076
  • 16
  • 154
  • 332
Fred Andrews
  • 648
  • 9
  • 18
  • Try enabling [multiline mode](http://stackoverflow.com/questions/3651725/match-multiline-text-using-regular-expression) so that `$` matches newlines not the end of strings..then you should be able to use your second expression. – Sam Jul 17 '14 at 21:42
  • can you tell us what are all possibilities? – Braj Jul 17 '14 at 21:48

3 Answers3

1

Your main problem is that matches checks if entire string matches regex. What you want is to test if string contains substring which can be matched by regex. To do so use

Pattern p = Pattern.compile(yourRegex);
Matcher m = p.matcher(stringYouWantToTest);
if (m.find()){
    //tested string contains part which can be matched by regex
}else{
    //part which could be matched by regex couldn't be found
}

You can also surround your regex with .* to let it match characters beside part you wanted to find and use matches like you are doing now,

if(yourString.matches(".*"+yourRegex+".*"))

but this will have to iterate over entire string.


In other words you can try to find \\b(19|20)\\d{2}\\b using Pattern/Matcher or use something like matches(".*\\b(19|20)\\d{2}\\b.*").

BTW parenthesis ) are not included in \w class so \b will accept place between \w and ) as word boundary so for instance "9)" will match regex \d\b\).

Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • That last BTW turned out to be the answer! Putting the \b before the \)? instead of after did exactly what I wanted. It becomees ((?:19|20)(?:\d){2})\b\)? – Fred Andrews Jul 17 '14 at 22:06
0

Your question isn't very clear, but from what I understand, this should work for you:

text.matches("((?:19|20)(?:\\d){2})\\)?");

Demo: http://regex101.com/r/lO0aH4/3

Michael Parker
  • 12,724
  • 5
  • 37
  • 58
  • That type of regex does not check for word endings, so that strings like "20147777" are also picked up. I am trying to find a regex that will get both end of word and end of string. Until I added the check for the optional ending ")", \b worked just fine. – Fred Andrews Jul 17 '14 at 21:48
0

You could try something like :

".*(19|20)[0-9]{2}\\)?$"

I'm not sure it could help you, it would better to give us a complete example of string to match. Must the string be ended by a year (with optional parenthesis) or may it be something else after ?

Idriss Neumann
  • 3,760
  • 2
  • 23
  • 32
  • The ending of the string is completely arbitrary. I was checking for a 4 digit year surrounded by word boundaries, and that was working well. I found that the check was getting fooled by the parenthesis and when I started checking for that, then the word boundary stopped working at end of string. – Fred Andrews Jul 17 '14 at 21:56