I am trying to use a multi-line regex to match all wildcards in a given source string. These strings can be in excess of 70,000 lines and each item is separated by a new line.
I seem to be experiencing huge processing times for my current regex and I can only assume that this is because it is probably poorly constructed and inefficient. If I execute the code on my phone it seems to run for an eternity.
My current regex:
(?im)(?=^(?:\*|.+\*$))^(?:\*[.-]?)?(?:(?!-)[a-z0-9-]+(?:(?<!-)\.)?)+(?:[a-z0-9]+)(?:[.-]?\*)?$
Valid wildcard examples:
*test.com
*.test.com
*test
test.*
test*
*test*
I compile the pattern with:
private static final String WILDCARD_PATTERN = "(?im)(?=^(?:\\*|.+\\*$))^(?:\\*[.-]?)?(?:(?!-)[a-z0-9-]+(?:(?<!-)\\.)?)+(?:[a-z0-9]+)(?:[.-]?\\*)?$";
private static final Pattern wildcard_r = Pattern.compile(WILDCARD_PATTERN);
I look for matches with:
// Wildcards
while (wildcardPatternMatch.find()) {
String wildcard = wildcardPatternMatch.group();
myProperty.add(new property(wildcard, providerId));
System.out.println(wildcard);
}
Are there any changes I can make to improve / optimise the regex or do I need to look at running .replaceAll
several times to remove all of the clutter before passing for regex matching?