I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)"
and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.