0

I've been trying to extract sets of coordinates in the following format:

[-34.0, 23, 0.555] , [3, 4, 5], ....

For the first set, I wish to extract "-34.0", "23", and "0.555". For the second set, "3", "4", and "5".

I've found a way to do so on stackoverflow and through my own experiments on https://regexr.com, but it implies that ".0" and ".555" will also be extracted as subgroups, which I do not wish for.

\[([-]?\d+(\.\d+)?),\s([-]?\d+(\.\d+)?),\s([-]?\d+(\.\d+)?)\]

subgroups

However, my initial alternatives are not working. Why are these not valid, and how to create a regex within my requirements?

a: Does not register the left bracket on [\d] as a special character and thus associates the right bracket to the [\. component's left bracket

\[([-]?\d+[\.[\d]+]?),\s([-]?\d+[\.[\d]+]?),\s([-]?\d+[\.[\d]+]?)\]

BracketNoCompute

b: Does not register the + sign as a special character

\[([-]?\d+[\.\d+]?),\s([-]?\d+[\.\d+]?),\s([-]?\d+[\.\d+]?)\]

PlusNoCompute

Thank you for your time!

Update:

I have now been made aware of the non-capturing group feature.

First of all - thank you! It did the job I needed.

Second of all - I'm still curious as to why the other options didn't work, so I'll leave this up for the next 24 hours or so, at least.

Update v2:

Questions fully answered. Thank you so much, everyone!

  • Use `(?:...)` to make a non-capturing group. – Barmar Aug 03 '23 at 16:48
  • And you can omit some of the square brackets as well `\[(-?\d+(?:\.\d+)?),\s(-?\d+(?:\.\d+)?), (-?\d+(?:\.\d+)?)]` See https://regex101.com/r/lVEMuX/1 – The fourth bird Aug 03 '23 at 16:49
  • Your previous attempts did not work, because this part `\d+[\.[\d]+` actually matches at least 2 characters, which is not present in `[3, 4, 5]` The `\d+` matches 1 or more digits, and this notation `[\.[\d]+` is a character class that matches 1 or more times either a `.` digit or `[` char – The fourth bird Aug 03 '23 at 17:23
  • @Barmar, The fourth bird The non-capturing group did it! Thanks for the help, everyone. As stated in my update, there are still some other regex things I'd like to understand, so I'll leave the post up for now. – Chad Chaddington Aug 03 '23 at 17:23
  • @Thefourthbird But why did it consider the second `[` a normal character rather than the start to a new set? – Chad Chaddington Aug 03 '23 at 17:26
  • 1
    @ChadChaddington Because that is the notation, read more about [character classes here](https://www.regular-expressions.info/charclass.html) Note that you can write `[\d]+` just as `\d+` to omit any confusion, and writing `[.\d]+` matches 1 or more times a dot or a digit (so it can also match only dots) – The fourth bird Aug 03 '23 at 17:29
  • `[` inside `[]` has no special meaning. You also don't have to escape `.` inside `[]`. – Barmar Aug 03 '23 at 17:34

1 Answers1

1

Your pattern does not match because \d+[\.[\d]+]? matches one or more digits \d+ followed by a character class [\.[\d]+ that repeats matching on of the listed characters and then an optional ]

You could write the pattern using 3 capture groups, with opitional non capturing groups (?:...)?

\[(-?\d+(?:\.\d+)?),\s(-?\d+(?:\.\d+)?),\s(-?\d+(?:\.\d+)?)]

See a regex demo.

Some notation notes:

  • [-]? ---> -?
  • [\.] ---> [.] or \.
  • \d+[\.[\d]+]? ---> I think you meant \d+[.\d]* where [.\d]* can also match only dots as the character class allows optional repeating of the listed characters.

For the notation, see character classes

The fourth bird
  • 154,723
  • 16
  • 55
  • 70