How to use regex to separate a string that contains any character and then ends in exclusively numbers?

Question

So I've been trying to come with a regex that separates these kinds of strings: A100, A-100, A1-100, A1_100, A1A100, "A-100", and many other examples.

The strings exclusively "end" with only numbers, and I say "end" because they can be in quotations, and technically it's not the end of the string, it's a word boundary though.

What I need is to get both things, whatever is behind only numbers and the string containing only numbers, I need to be able to separate them because I might need to do some additions to the only numbers part.

What I've tried is:

At the very start it was easy, A100 was easily separated with something like ([a-zA-Z]+)(\d+), but then I needed to separate A_100, and I need one string that has the A_ and the other the 100, or if it's A1-100, I would need A1- and then the number part 100.
With many iterations of this problem I ended up with this messy regex:
```
([a-zA-Z\+\.\?\!_\-\\\d]+[a-zA-Z\+\.\?\!_\-\\]+)(\d+)
```
It separates a lot of the stuff I need EXCEPT for the more simple A100, because if the first part of the string has a number in it (like A1A100) then it needs to have something else but a digit, or else I would just get A1 and A100. But this is very very messy, and I would rather do something simple like ([^\n])(\d+) (this obviously doesn't work) and get any string that can contain any character but newlines and then get the string that ends exclusively with numbers.
Tried to implement lookaheads, but I'm not very good with them. ((?=\d+)\d+) would get me exclusively the number part on A100 but can't for the life of me manage to combine it with any other char string part.

All of this with an implementation that works with C# and .NET. Any guidance?

41686d6564 stands w. Palestine · Accepted Answer · 2021-06-17T19:49:13.587

4

You may use the following pattern:

\b([A-Za-z]+(?:[A-Za-z0-9]*[A-Za-z_\-])?)(\d+)\b

Demo.

Details:

\b - Word boundary.
( - Start of group 1.
- [A-Za-z]+ - Match one or more letters.
- (?: - Start of a non-capturing group.
  - [A-Za-z0-9]* - Match zero or more alphanumeric characters.
  - [A-Za-z_\-] - Match a single letter, underscore, or hyphen.
- )? Close the non-capturing group and make it optional.
) - Close group 1.
(\d+) - Match one or more digits and capture them in group 2.
\b - Word boundary.

Note: It's not entirely clear from your question what characters are accepted. This assumes letters, digits, an underscore, and a hyphen. Feel free to add more characters in the appropriate character class if you need to support more.

edited Jun 17 '21 at 19:49

answered Jun 17 '21 at 19:19

41686d6564 stands w. Palestine

19,168
12
41
79

Yes, this worked!! Now, I need to read more on non capturing groups, they just don't click with me yet. Also, just changed `[A-Za-z0-9]*` for `[A-Za-z0-9_]*` to include underscores inside the string. Thank you a lot! – aklassen Jun 17 '21 at 20:51
1

@aklassen See: [What is a non-capturing group in regular expressions?](https://stackoverflow.com/q/3512471/8967612) – 41686d6564 stands w. Palestine Jun 17 '21 at 20:55
Thanks, based on that lecture and on your regex I managed to resume it to `\b((?:.*\D)?)(\d+)\b`, which separates exactly the last part with digits and it doesn't care what characters are on the first part, the string can even start with numbers and it works. So thanks again, learnt a lot from this. – aklassen Jun 17 '21 at 21:49

How to use regex to separate a string that contains any character and then ends in exclusively numbers?

1 Answers1