3

I need help with regular expression.

Using javascript I am going through each line of a text file and I want to replace any match of [0-9]{6,9} with a '*', but, I don't want to replace numbers with prefix 100. So, a number like 1110022 should be replaced (matched), but 1004567 should not (no match).

I need a single expression that will do the trick (just the matching part). I can’t use ^ or $ because the number can appear in the middle of the line.

I have tried (?!100)[0-9]{6,9}, but it doesn't work.

More examples:

Don't match: 10012345

Match: 1045677

Don't match:

1004567

Don't match: num="10034567" test

Match just the middle number in the line: num="10048876" 1200476, 1008888

Thanks

Community
  • 1
  • 1
Asaf
  • 75
  • 6

1 Answers1

2

You need to use a leading word boundary to check if a number starts with some specific digit sequence:

\b(?!100)\d{6,9}

See the regex demo

Here, the 100 is checked right after a word boundary, not inside a number.

If you need to replace the matches with just a single asterisk, just use the "*" as a replacement string (see snippet right below).

var re = /\b(?!100)\d{6,9}/g; 
var str = 'Don\'t match: 10012345\n\nMatch: 1045677\n\nDon\'t match:\n\n1004567\n\nDon\'t match: num="10034567" test\n\nMatch just the middle number in the line: num="10048876" 1200476, 1008888';
document.getElementById("r").innerHTML = "<pre>" + str.replace(re, '*') + "</pre>";
<div id="r"/>

Or, if you need to replace each digit with *, you need to use a callback function inside a replace:

String.prototype.repeat = function (n, d) {
    return --n ? this + (d || '') + this.repeat(n, d) : '' + this
};

var re = /\b(?!100)\d{6,9}/g; 
var str = '123456789012 \nDon\'t match: 10012345\n\nMatch: 1045677\n\nDon\'t match:\n\n1004567\n\nDon\'t match: num="10034567" test\n\nMatch just the middle number in the line: num="10048876" 1200476, 1008888';
document.getElementById("r").innerHTML = "<pre>" + str.replace(re, function(m) { return "*".repeat(m.length); }) + "</pre>";
<div id="r"/>

The repeat function is borrowed from BitOfUniverse's answer.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Isn't that what you need? It is meaningless to set any legnth restrictions if you do not specify boundaries. My code handles all the test cases in your question. – Wiktor Stribiżew Jan 26 '16 at 13:45
  • Thanks for the answer. The problem is that word boundary matches the entire number. [0-9]{6,9} will allows me to leave the tail of the number, meaning 123456789012, will become *012, and this is what I want. But \b[0-9]{6,9}\b will not match 123456789012. Is there a way to avoid prefix 100 it without using word boundary? Sorry for being so difficult. Thanks – Asaf Jan 26 '16 at 13:55
  • 1
    Just use the leading boundary then. – Wiktor Stribiżew Jan 26 '16 at 14:00