6

I'm trying to parse out words from a delimited string, and have the capture groups in sequential order. for example

dog.cat.chicken.horse.whale

I know of ([^.]+) which can parse out each word but this puts every string in capture group 1.

Match 1
Full match  0-3 `dog`
Group 1.    0-3 `dog`
Match 2
Full match  4-7 `cat`
Group 1.    4-7 `cat`
Match 3
Full match  8-15    `chicken`
Group 1.    8-15    `chicken`
Match 4
Full match  16-21   `horse`
Group 1.    16-21   `horse`
Match 5
Full match  22-27   `whale`
Group 1.    22-27   `whale`

What I really need is something like

Match 1
Full match  0-27    `dog.cat.chicken.horse.whale`
Group 1.    0-3 `dog`
Group 2.    4-7 `cat`
Group 3.    8-15    `chicken`
Group 4.    16-21   `horse`
Group 5.    22-27   `whale`

I've tried multiple iterations with no success, does anyone know how to do this?

  • Mention language in use. – Rahul Jan 12 '18 at 17:55
  • `([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)` is the only way to accomplish this in regex alone (for golang) that I'm aware of. Your best bet is taking each capture group 1 and adding it to a list, basically mapping it. Why not just split on `.`? – ctwheels Jan 12 '18 at 18:23
  • This would work if my delimited strings where the same length but some aren't, would need something that can dynamically parse out between the periods no matter how many there are. – Richard Oswald Jan 12 '18 at 18:29
  • What about using [`FindAllString`](https://golang.org/pkg/regexp/#Regexp.FindAllString) and using it like [this](https://tio.run/##Jc7LCsIwFATQde9XXLJKsV5wq3QhgmvBpaiENE1D8yKNDxC/vYa6GhiGw@gwz1HIUWiFThgPYFwMKSOHivUus13JpLR6RwY1QP/wchnyGj9QPUXCCVtkXdAkRSY5GDkqT0NIk6LXIKxiUKUG77ht8Q/RIbhorOLscqPrihV2gfoCJToa3@2tPedkvOZTg@tNDVW5QqfSZOt5X8MX5vkH)? – ctwheels Jan 12 '18 at 18:41
  • Why not just use a split method on `.`? – CAustin Jan 12 '18 at 18:50
  • Thanks for your suggestions, an error on my part. These regex expressions are used in Prometheus, I thought it used the golang version but now I think I was wrong. Unfortunately I can't use ctwheels or CAustins suggestion in this case. – Richard Oswald Jan 12 '18 at 19:13
  • Can you share your actual use case rather than a hypothetical? – brian-brazil Jan 12 '18 at 19:35
  • 1
    There is no good solution for this case. All you might do is add optional non-capturing groups with these capturing ones to account for some set number of groups. `([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?` - something like this. – Wiktor Stribiżew Jan 12 '18 at 20:24
  • This did the trick @WiktorStribiżew! Thank you! – Richard Oswald Jan 12 '18 at 22:07

1 Answers1

1

There is no good solution for this case. All you might do is add optional non-capturing groups with the capturing ones to account for some set number of groups.

So, it might look like

([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?

and so on and so forth, just add more (?:\.([^.]+))? until you reach some limit that you should define.

See the regex demo.

Note that you might want to anchor the pattern to avoid partial matches:

^([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)(?:\.([^.]+))?(?:\.([^.]+))?(?:\.([^.]+))?$

The ^ matches the start of the string and $ asserts the position at the end of the string.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563