0

I have a big Java regex pattern composed of multiple subpatterns concatenated by OR (|). I want to allow multiple delimiters anywhere in between the numbers.

For example, I have the following pattern "(3[47][0-9]{13})|(56022[1-5][0-9]{10}|(5610)[0-9]{12})". How do I allow the following delimiters: equal to (=), backslash (\), dot (.), hyphen (-) and white space ().

These delimiters can appear anywhere (except start and end) and any number of times in between the numbers which match the numeric pattern.

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
pikaraider
  • 187
  • 1
  • 14

1 Answers1

0

You will have to insert [\s=\\.-]* pattern (it matches zero or more whitespaces, =, \, . and -) in between all digit matching patterns and convert \d{X} into \d(?:[\s=\\.-]*\d){X-1} patterns:

(3[\s=\\.-]*[47][\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){12})|(5[\s=\\.-]*6[\s=\\.-]*0[\s=\\.-]*2[\s=\\.-]*2[\s=\\.-]*[1-5][\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){9}|(5[\s=\\.-]*6[\s=\\.-]*1[\s=\\.-]*0)[\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){11})

See the regex demo

Do not forget to double the backslashes when using the pattern inside a Java string literal:

String part_of_regex = "(3[\\s=\\\\.-]*[47][\\s=\\\\.-]*[0-9](?:[\\s=\\\\.-]*[0-9]){12})|(5[\\s=\\\\.-]*6[\\s=\\\\.-]*0[\\s=\\\\.-]*2[\\s=\\\\.-]*2[\\s=\\\\.-]*[1-5][\\s=\\\\.-]*[0-9](?:[\\s=\\\\.-]*[0-9]){9}|(5[\\s=\\\\.-]*6[\\s=\\\\.-]*1[\\s=\\\\.-]*0)[\\s=\\\\.-]*[0-9](?:[\\s=\\\\.-]*[0-9]){11})";
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks Wiktor. Is there any shortcut? For example, I have this delimiter pattern separately and somehow I can apply it programmatically to the numeric pattern in java. As I said, it's part of a far bigger regex and I can't practically put ```[\s=\\.-]*``` in between all digit matching patterns. – pikaraider Jul 04 '19 at 12:04
  • @pikaraider It can be done dynamically on a literal string, not on a regex pattern. In the latter case, you need a regex parser. – Wiktor Stribiżew Jul 04 '19 at 12:08
  • Okay. Assuming I have a string instead of a pattern, how can I insert your delimiter string pattern in between all digit matching patterns. I can always convert the literal string to a pattern. – pikaraider Jul 04 '19 at 12:12
  • Show an example and expected output please. – Wiktor Stribiżew Jul 04 '19 at 12:14
  • This is the literal string containing the numeric pattern - ```(3[47][0-9]{13})|(56022[1-5][0-9]{10}|(5610)[0-9]{12})```. I want the delimiter string ```[\s=\\.-]*``` to be placed in between all digit matching patterns in the numeric literal string and final result should be a string, which I can then convert into a regex pattern. – pikaraider Jul 04 '19 at 12:18
  • `(3[47][0-9]{13})|(56022[1-5][0-9]{10}|(5610)[0-9]{12})` is not a literal string, it is a regex pattern. A literal string is `12243`, `hello world`, but `\d{12}` is a regex pattern and you need a regex parser. – Wiktor Stribiżew Jul 04 '19 at 12:18
  • Thank you for making it clear. Basically, I want to be able to find all the strings in a line matching the numeric regex pattern that I have. It should handle the presence of multiple delimiters as I have asked in my question. Once the input string matches the numeric pattern (which handles delimiters), I want to replace only the numbers with a special character (say, *). Delimiters should be left alone as they are. – pikaraider Jul 04 '19 at 12:24
  • @pikaraider Sounds like you need a `Matcher.appendReplacement`, see [this answer](https://stackoverflow.com/a/377484/3832970). – Wiktor Stribiżew Jul 04 '19 at 12:29
  • If my subjectString is ```3788-6863-7988-407 valid 378868637988408```, the resultString according to the answer in above link is ```* valid *```. Whereas what I want is ```****-****-****-*** valid ***************```. Also, could you please let me know how I can dynamically add your delimiter pattern in between all digit matching patterns using a regex parser so that I get a pattern which handles what I want. – pikaraider Jul 04 '19 at 12:44
  • @pikaraider You should search for regex parsers yourself, that will make the question too broad. Use `appendReplacement` to replace all digits with `*`, that is simple. – Wiktor Stribiżew Jul 04 '19 at 12:46
  • ```String str = "random valid 4035300539804084 random invalid 4035300539804085"``` ```Expected output = "random valid **************** random invalid 4035300539804085"``` ```Matcher m = customPattern.matcher(str); while(m.find()) { if (myMethod(m.group())) { s = s.replaceAll("[0-9]", "*") } }``` I want to replace numbers with * only when myMethod() returns true. Else numbers should not get replaced. But my above code is replacing all the numbers in str with *. Please help. – pikaraider Jul 04 '19 at 13:36
  • @pikaraider Follow [**this solution**](https://stackoverflow.com/a/377484/3832970) – Wiktor Stribiżew Jul 04 '19 at 13:42
  • Thanks. How would I apply the same delimiters ```[\s=\\.-]*``` to this pattern ```(?!000|666)[0-8][0-9]{2}(?!00)[0-9]{2}(?!0000)[0-9]{4}``` ? – pikaraider Jul 04 '19 at 14:30
  • @pikaraider The same way as I described in the answer: `(?!0[\s=\\.-]*0[\s=\\.-]*0|6[\s=\\.-]*6[\s=\\.-]*6)[0-8][\s=\\.-]*[0-9][\s=\\.-]*[0-9](?!0[\s=\\.-]*0)[0-9][\s=\\.-]*[0-9](?!0[\s=\\.-]*0[\s=\\.-]*0[\s=\\.-]*0)[0-9](?:[\s=\\.-]*[0-9]){3}` – Wiktor Stribiżew Jul 04 '19 at 14:32
  • It should match only 9 digit numbers which follow the pattern ```(?!000|666)[0-8][0-9]{2}(?!00)[0-9]{2}(?!0000)[0-9]{4}``` with any number of delimiters in between. However, the one you have written matches ```3707- 8970- 9084 \.-=107``` till the first 9 digits. A sample input that should match is ```0 01-45-6.7\8=9``` . I think my regex is itself a bit wrong. Basically I want to match SSN numbers. The rules are here: https://pastebin.com/cak8nigP – pikaraider Jul 04 '19 at 14:53
  • @pikaraider To match as whole words, use word boundaries, `\b`. `\byour_pattern\b`. – Wiktor Stribiżew Jul 04 '19 at 19:34
  • I have a string in which I want to star all occurrences of the pattern. I have the following code: https://pastebin.com/ivWCwC8E My code works for all the cases except when there are multiple matches in a single line (line 5). How do I resolve this? Please help. – pikaraider Jul 07 '19 at 15:06
  • How to exclude a pattern from matching? For example, if I have ```(\b3[\s=\\.-]*[47][\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){12}\b)``` and want to exclude ```\b6011[0-9]{12}\b``` , how can I achieve it? I tried (pattern &^ exclusionPattern) but it doesn't work. – pikaraider Jul 10 '19 at 09:07
  • @pikaraider I am sorry I cannot help right now as I am on a business trip with little amount of time I can devote to SO. Note it is hard to "exclude" a pattern from a pattern that matches chars of mixed type as you cannot safely use word boundaries. If you think you can rely on word boundaries, you may use `\b(?!your_excluded_pattern)your_generic_pattern\b` – Wiktor Stribiżew Jul 10 '19 at 16:52