Vanilla Regex: is it possible to trim specific chars from the MIDDLE of a string with only regex?

Question

So currently I am working with a parsed yaml file that uses Regex to validate string values in a loan schema.

Specifically, right now I am working on postal code validation. I saw the post that provided a way to validate regex for postal codes that contain hyphens and spaces as well.

My current pattern, ^\d{5}(?:[-\s]\d{4})?\s+$\g, matches the following example formats, with either 5dig or 9dig zips accepted:

12345-1234
12345 1234
12345

I found this solution here at Stack Overflow and have since combined it with this approach to trim whitespace off the ends of the string.

While this originally fit the criteria, my company requested I replace/scrape any hyphens or spaces from the postal code value for the following output

12345-1234 -> 123451234
12345 1234 -> 123451234
 12345  -> 12345

I can do this directly in Java but, since I have developed a framework in Java that runs hundreds of different string validations, I'd like to avoid having to make a specific code block just for this one string validation.

Is there an efficient way for me to trim characters from the middle of a string using solely vanilla regex?

I know a lot about regex, but have never heard of yaml loan schema. I can't understand what you want. — sln, Dec 03 '21 at 20:19
regex doesn't change strings; it only matches them, or part of them, so to "trim" characters, you must write that tiny amount of java you're dreading: `str = str.replaceAll("(?<=^\\d{5})[-\\s](?=\\d{4}\\s*$", "");` — Bohemian, Dec 03 '21 at 20:32
@Bohemian Well your statement is certainly true for the regex that is used in the answer the OP has linked to. But if you wanted to remove trailing blanks in a string and you were allowed to assume that the string contained at least one non-blank character, then you could match the string against `.*\S(?=\s*$)` and I would think that the matched string would be the original string without the trailing blanks. Of course, that's no help here. — Booboo, Dec 03 '21 at 20:45
So is this running under a java regex flavor? I can give you a .Net regex solution if you want to avoid Java.... — ΩmegaMan, Dec 03 '21 at 21:35
Replacing all *non digits* `\D+` with empty, or am I missing something :S — bobble bubble, Dec 04 '21 at 00:36

The fourth bird · Answer 1 · 2021-12-03T21:44:40.780

Java does support lookarounds, but if you are using replaceAll, you can also replace with 2 capture groups, as lookarounds can be costly. (There is no such thing as vanilla regex, there are a lot of different regex engines)

Note that \s can also match a newline, and \s+ at the end matches 1 or more whitespace chars that might change the format of the yaml file.

Instead you might use \h to match horizontal whitespace chars.

Using [-\s] in the pattern matches a single char, being either - or a whitespace char. If there is always one of them present, and maybe more, you can also use a quantifier there.

^(\d{5})(?:[-\h]+(\d{4}))?\h+$

Regex demo | Java demo

String regex = "^(\\d{5})(?:[-\\h]+(\\d{4}))?\\h+$";
String string = "12345-1234 \n"
        + "12345 1234 \n"
        + "12345 ";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
System.out.println(matcher.replaceAll("$1$2"));

Output

123451234
123451234
12345

@sln Thanks, very much appreciated coming from a regex expert like you :-) — The fourth bird, Dec 03 '21 at 23:46

Vanilla Regex: is it possible to trim specific chars from the MIDDLE of a string with only regex?

1 Answers1