0

In attempting some number validation, there is one case I want to exclude a number - if it contains a hyphen before the four digits.

To simplify my regular expression, let's only worry about those 4 digits.

Since I'm using JavaScript, I can't use lookbehinds.

In an attempt to use a negative lookahead to match anything not containing a hyphen, I came up with:

((?!-).)\d{4}

My test data is below, bolded are the matches:

2014
1106 **2014** **9899**
**11500**

234-233-2014
234-234-1100
-1100

Where my expectation is that 2014, 1106, 2014 and 9989 match, whereas 11500 does not. I know the issue is with the period is due to the fact that it matches anything except for line breaks. I also am trying to consider line breaks as I apply the word boundaries to my regular expression.

Might there be a better solution where I can match only a 4 digit number not followed by a hyphen, or simply exclude any matches if they are preceded by a hyphen?

signus
  • 1,118
  • 14
  • 43

4 Answers4

2

Through regex only,

(?:(?!\b-\b|-\b)(?:.|^))\b(\d{4})\b

Get the numbers from group index 1.

DEMO

And your js code would be,

> console.log(text.match(/(?:(?!\b-\b|-\b)(?:.|^))\b(\d{4})\b/g));
[ '2014', ' 1106', ' 2014', ' 9899' ]

OR

> function getMatches(string, regex, index) {
... index || (index = 1);
... var matches = [];
... var match;
...     while (match = regex.exec(string)) {
.....         matches.push(match[index]);
.....     }
... console.log(matches);
... }
undefined
> var matches = getMatches(text, re, 1);
[ '2014', '1106', '2014', '9899' ]

Code stolen from here :-)

Community
  • 1
  • 1
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • In your case, you need only one line `console.log(text.match(/(?:(?!\b-\b|-\b)(?:.|^))\b(\d{4})\b/g));` ;) [jsBin](http://jsbin.com/zafixusejafo/1/edit?html,js,output) – hex494D49 Aug 26 '14 at 01:18
  • @hex494D49 so match function gives the first preference to groups. Am i correct? – Avinash Raj Aug 26 '14 at 01:45
  • Yes, `match()` returns an array containing the matches. In case you have more groups, match[0] will contain the whole match. But in this case match[0] = 2014, match[1] = 1106 ... – hex494D49 Aug 26 '14 at 01:54
  • Nicely done, and showing the matches in JS is a nice step, thanks! – signus Aug 26 '14 at 17:15
1

Although this doubles up your searches, you can do a lookahead with both a positive and negative component to it:

(?=(?!-)\d{4})\b\d{4,}\b

This regex101 example doesn't capture the numbers, where this regex101 example does.

OnlineCop
  • 4,019
  • 23
  • 35
  • Since the OP is looking to match digits, wouldn't `\D` be better than `\b`? – RobG Aug 25 '14 at 23:35
  • @RobG: `\b` is a zero-length word boundary match, where `\D` will actually match a single non-digit character (it is equivalent to `[^0-9]`, which will also include the `-` that the OP wishes to exclude). – OnlineCop Aug 25 '14 at 23:41
0

This is a workaround in JavaScript using replace()

var text = "2014 \
1106 2014 9899 \
 11500 \
\
234-233-2014 \
234-234-1100 \
-1100";

var a = [];
text.replace(/(-?\b\d{4}\b)/g, function(m){
  if(!m.match(/-/g)) a.push(m);  
});

console.log(a); 

Output:

["2014", "1106", "2014", "9899"] 

Working jsBin


Previous attempt (using look-behind which isn't supported in JavaScript)

/(?<!-)\b(\d{4})\b/g

Demo

hex494D49
  • 9,109
  • 3
  • 38
  • 47
0

Match either four digits that are the beginning of a line, or four digits that don't come after a hyphen:

/[^-]\b\d{4}\b|^\b\d{4}\b/
sahbeewah
  • 2,690
  • 1
  • 12
  • 18
  • I would match at the beginning of the line if every instance were like that. However the issue with `[^-]` is that matches any character except the hyphen, meaning it matches line breaks and white spaces, etc., before it matches 4 digits. – signus Aug 25 '14 at 23:25