Regex: Split string of varying length into multiple groups without using supporting code

Question

I have a string that can vary in size containing multiple substrings
These substrings are delimited by a colon.
I need to capture these substrings into groups, but I cannot use any supporting language to do this. It has to be regex only and work in this tester https://regexr.com/. The reason for this limitation is that I am cutting strings via a UI that doesn't support additional code (Adobe Analytics). This means I cannot use functions such as 'split()' or 'explode()'.
I would like a single expression as an answer.

Example1: test1:test2:test3:test4 would be broken into 4 groups.

test1
test2
test3
test4

Example2: 123:abc would be broken into just 2 groups.

123
abc

Is this possible? Thanks, Chris

Sorry ctwheels. I meant **capture**, not match. I've updated by question to suit — Chris, Feb 26 '18 at 21:27
That would only capture that last substring 'test4' or 'abc', and place it in group1. It would not capture all substrings and place them in seperate groups. — Chris, Feb 26 '18 at 21:33
Then you’d have to construct a monster of `([^:]+):([^:]+)` — ctwheels, Feb 26 '18 at 21:34
Yeah - that is what I have been doing, but got stuck on the varying length issue. How would i write this monster if some examples have 2 substrings in them, and others 10? — Chris, Feb 26 '18 at 21:37
then you need to make them optional using the `?` quantifier like so `([^:]+)(?::([^:]+))?(?::([^:]+))?` — ctwheels, Feb 26 '18 at 22:07
Thanks for the reply ctwheels. Ended up going with xtj7's solution below. — Chris, Feb 27 '18 at 00:11

xtj7 · Accepted Answer · 2018-02-26T21:52:48.460

1

Yes, quite simple actually:

/([^:]+)/

I hope that is what you meant :)

UPDATE

After you refined your answer, you mean you want multiple groups on one match. This is contrary to how you would normally use a regex (and you are probably aware of that), but with the given limitations of your tool, the best you can do is a finite set of groups, which you have to read from your first match.

I am not familiar with the tool you use, so I can't say for sure if it won't produce any negative side-effects, but this would be the closest you could get. Example for maximum of 8 groups:

([^:]+)?:?([^:]+)?:?([^:]+)?:?([^:]+)?:?([^:]+)?:?([^:]+)?:?([^:]+)?:?([^:]+)?

A proper solution that deals with indefinite groups would not work unfortunately. You need to manually create matching groups. Simply duplicate the following for as many groups as you need (max):

([^:]+)?:?

It is ugly but might just work.

If you need this completely dynamic, however, that is not possible.

edited Feb 26 '18 at 21:52

answered Feb 26 '18 at 21:04

xtj7

548
3
6

Thanks for the reply but unfortunately, that isn't quite what I am after. While it does match, it doesn't capture. I'll update my question to be more precise. https://stackoverflow.com/questions/21200514/regular-expression-matching-vs-capturing – Chris Feb 26 '18 at 21:25
I don't quite understand that. Of course it captures, I do get individual matches for every substring as per your examples 1 and 2. Are you sure you copied the string including brackets? – xtj7 Feb 26 '18 at 21:33
I need to capture it into groups which the UI tool i am using reads. These 'capture groups', are separate to matches. I changed the link for the Regex tester in my question. If you click on the 'Details' section within the tester, you can see the difference between the groups and the matches – Chris Feb 26 '18 at 21:40
That is unfortunate and I can see your struggle. A dynamic solution for that is not possible, however. I've added the workaround solution, which should work dynamically **until** a maximum amount of matches. So it will match for 0 to x matches (depending how often you duplicate it) and leave the remaining groups empty. If your tool can deal with that, it may be the closest you can get. – xtj7 Feb 26 '18 at 21:56
Thanks @xtj7! I'm marking your answer as correct as it is the closest I think I can get given my limitations. I tried it in the tool and it is working correctly. I should never have more than 20 substrings, so this will suit me fine. Thanks again, Chris :) – Chris Feb 26 '18 at 22:15
Thank you too! It is always frustrating to see that there is no elegant solution to a problem. But as long as it works... :) – xtj7 Feb 26 '18 at 22:17
It should instead be `([^:]+)(?::[^:]+)?` with the last part duplicated whatever amount of times – ctwheels Feb 27 '18 at 01:04
1

@xtj7 for context, this is for Adobe Analytics Classification Rule Builder. When you send data to AA, you have the base value (dimension). You can than create additional dimensions associated with that base value, and either upload data to it (using the base value as the unique key) or use Classification Rule Builder to generate values for them based off the original value. So let's say you have a value "a:b:c" populating the base report. you can for example create 3 additional dimensions and break up that base value into the 3 additional dimensions. This is useful for creating aggregated – CrayonViolent Feb 27 '18 at 01:45
data buckets. In principle, it's similar to looking at say the URL structure of a website, where you have first dir level path /foo and maybe lots of sub dirs /foo/bar or /foo/something, and this allows you to more easily look at metrics for everything in /foo – CrayonViolent Feb 27 '18 at 01:47
Here is an example of what the Classification Rule Builder interface (well, the relevant part of it), looks like: https://i.imgur.com/wmIvJoq.png so in this example, i might have an original value of "1-100:400-100:new york:10144:new york:[n/a]:[n/a]:[n/a]" . I then setup a regex (the same regex) for each dimension (classification) for the base value (key) , to push the individual pieces of the key to, and then I can go to each of those reports individually and just see the values for them, e.g. go to the City (v60) report and just see "new york" instead of the full value. – CrayonViolent Feb 27 '18 at 01:50
TL;DR - it's a way to push lots of data points into a single data point and then break them out later on, and the Classification Rule Builder helps do that – CrayonViolent Feb 27 '18 at 01:51

Regex: Split string of varying length into multiple groups without using supporting code

1 Answers1