-1

I am using Regex for matching incoming file content to detect an ID which has following pattern

AXXXXXXXXXX-MID-XX (Where X = numeric values with length 10 and 2)

Here's my Regex (.|\n|\r)*(A[0-9]{10}-MID-[0-9]{2})(.|\n|\r)*

But, when the content exceeds like 1500 characters, I get StackOverflow error.

enter image description here Seeking help here to check if this look like something which can be optimized?

Here's the Java Code -

String pattern1="(.|\n|\r)*(A[0-9]{10}-MID-[0-9]{2})(.|\n|\r)*";
if(file_content.matches(pattern1)) {
//...Do something <-- The code never reaches here.

}
Techidiot
  • 1,921
  • 1
  • 15
  • 28
  • 4
    You don't really need the `(.|\n|\r)`, just use `A[0-9]{10}-MID-[0-9]{2}`. If it matches, select all text. If you absolutely must, use a more concise and better performing version of it by either replacing `(.|\n|\r)` with `[\s\S]` or `.` with `Pattern.DOTALL` flag – ctwheels Oct 23 '20 at 17:01
  • Sounds like you only need to execute `A[0-9]{10}-MID-[0-9]{2}`. Everything before and after that is completely unnecessary. By supplying `(.|\n|\r)*` at the front and end you are effectively reading the entire file into regex capture groups and consuming way more memory than you need to. – MonkeyZeus Oct 23 '20 at 17:15
  • @MonkeyZeus Thank you! Understood the "why not to use" part. – Techidiot Oct 27 '20 at 12:09

2 Answers2

0

You have unnecessarily used (.|\n|\r)*. Given below is a cleaner and more performant way of doing it:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("A[0-9]{10}-MID-[0-9]{2}");
        String fileContent = "A1234567890-MID-12";// Replace it with actual content
        Matcher matcher = pattern.matcher(fileContent);
        while (matcher.find()) {
            String id = matcher.group();// If you want to grab the ID
            System.out.println(id);
            // ...do something
        }
    }
}

Output:

A1234567890-MID-12
Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
0
Pattern pattern = Pattern.compile("A[0-9]{10}-MID-[0-9]{2}");

for(String line : lines) {
    Matcher matcher = pattern.matcher(line);

    if (matcher.find()) {
        String id = matcher.group();
        // ...Do something <-- The code never reaches here. 
    }
}
Oleg Cherednik
  • 17,377
  • 4
  • 21
  • 35