3

There's plenty of these questions on SO, but I'm finding none specific to this case by extracting a vector coordinates to integer values.

I'm parsing the current string:

"AT (1,1) ADDROOM RM0001"

I need to get the numbers between (#,#) and store them as integers.

One solution I tried was get the index of the first '('. Read in the string until a comma is met, and then read again until the last ')' is met. Another solution I tried was find the ',' and get the string prior and after it before meeting a parenthesis.

I keep feeling this is error-prone though. I feel there's an easier way to do this if reading from the string. I was researching on Regex.

Bob
  • 115
  • 10

5 Answers5

4

This is the regex you need:

\((-?\d+)\s*,\s*(-?\d+)\)

You can use it like this:

var match = Regex.Match("some string (1234, -5678)", @"\((-?\d+)\s*,\s*(-?\d+)\)")
var x = match.Groups[1].Value;
var y = match.Groups[2].Value;

x and y will be the two components of the vector.

Explanation:

  • \( open parenthesis
  • (-?\d+) first capturing group that captures the first component of the vector. The - is optional (so both positive and negative integers are matched) followed by 1 or more digits (\d). This same thing appears again later in the regex to capture the second component
  • \s* this allows any number of whitespace between the numbers and the comma
  • \) close parenthesis

For a lower level explanation, go to https://regex101.com/r/PnwBJS/1 and look at the "explanation" section.

Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • Ooh. Negative numbers. Though OP never specified that, that's actually a really good point. – Nyerguds Apr 27 '18 at 05:52
  • Your x and y variables are incorrect, though; they'll be the group objects, not their integer value content. This is one of the reasons I dislike using `var`... you can write code like this without noticing details like that. – Nyerguds Apr 27 '18 at 05:53
  • @Nyerguds I never intended to make them `int`s. I'll leave it to OP to do that. – Sweeper Apr 27 '18 at 05:56
  • 1
    Still, unless you add `.Value`, claiming that "x and y will be the two components of the vector" is just wrong. They will _contain_ these components _in one of their properties_. This distinction is important for someone who hasn't worked with regex before. – Nyerguds Apr 27 '18 at 05:57
3

Here's a short program that captures the two numbers separately:

using System;    
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        string s = "AT (1,1) ADDROOM RM0001";
        Regex r = new Regex(@"\((\d+)[^\d]+(\d+)\)");
        MatchCollection mc = r.Matches(s);

        var firstNumber = mc[0].Groups[1].Value;
        var secondNumber = mc[0].Groups[2].Value;
    }
}

Explanation

  • \( matches the open parenthesis
  • (\d+) matches a number; one or more digits
  • [^\d]+ matches a non-number; one or more characters
  • (\d+) matches a number; one or more digits
  • \) matches the close parenthesis
Jake Reece
  • 1,140
  • 1
  • 12
  • 23
1

Well Regex would be the easiest approach here but you would have to filter it anyway as it would extract the numbers outside parenthesis i suppose (i dont know too much about Regex),\d+ is the regex for an integer number if you decide to go this way. Another approach would be what you describe just filter the string and then parse it into integer. Last approach i can think of is using linq and then parsing into integer.

Here is the Regex approach i did for your string:

string line = "AT (1,1) ADDROOM RM0001";
string output = Regex.Match(line, @"\(([^)]*)\)").Groups[1].Value;
string[] result = output.Split(',');

Then you just parse the strings in the array as you need them.

Pato Srna
  • 78
  • 11
  • 1
    A well-designed regex pattern would not extract numbers outside the parentheses... obviously you shouldn't just match bare `\d`. Patterns can always be made more specific to more accurately match the data, but OP has really only given one example... – Nyerguds Apr 27 '18 at 06:29
  • That is correct, as i said i dont know much about Regex so i would assume the patterns can get quite specific and complex – Pato Srna Apr 27 '18 at 07:19
1

As far as regexes go, this case seems rather trivial, but I know regex can be hard to wrap your head around, so I'll go over it in detail.

In regex, a numeric value can be represented by \d. I'm assuming you want to capture coordinates larger than 9 too though, so we'll use \d+ to capture multiple numbers.

For the rest, the comma is just a literal, and it doesn't have any special function in regex when not used inside specific structures. Unescaped brackets become capture groups, so literal brackets need to be escaped by putting a \ before them. You need two capture groups here, one around each of your numbers.

With that, your (#,#), as regex, will become \((\d+),(\d+)\). Literal opening bracket, a capture group with one or more number characters in it, a literal comma, another number characters capture group, and finally, the literal closing bracket.

Test it out online

Then, it's just a matter of getting the capture groups and parsing them as integers. Note that there's no need for TryParse since we are 100% sure that the contents of these groups are one or more numeric symbols, so if the regex matches, the parse will always succeed.

    String str ="AT (1,1) ADDROOM RM0001";
    // Double all backslashes in the code, or prefix the quoted part with @
    Regex coords = new Regex("\\((\\d+),(\\d+)\\)");
    MatchCollection matches = coords.Matches(str);
    if (matches.Count > 0)
    {
        Int32 coord1 = Int32.Parse(matches[0].Groups[1].Value);
        Int32 coord2 = Int32.Parse(matches[0].Groups[2].Value);
        // Process the values...
    }

Of course, if you have multiple matches to process, you can use a foreach to loop over the contents of the MatchCollection.

[edit]

As pointed out by Sweeper, if you want to capture both positive and negative numbers, the capture groups should become (-?\d+). The question mark indicates an optional component, meaning there can be either 0 or 1 of the literal - character that's before it. This would give:

\((-?\d+),(-?\d+)\)

And if you want to allow spaces between these coordinates and the brackets / commas, you'll need to add some \s* between the groups, brackets and commas too. The * is a modifier (like + and ?) which indicates zero or more occurrences of the element before it. \s is shorthand for "whitespace", just like \d is for numbers.

This more lenient version would be

\(\s*(-?\d+)\s*,\s*(-?\d+)\s*\)

Nyerguds
  • 5,360
  • 1
  • 31
  • 63
  • You say it's trivial, but to someone who's never used REGEX it's plenty complicated. No need to belittle. – Jake Reece Apr 27 '18 at 05:40
  • 1
    True, but that's what research is for. The first year I started using regex extensively I had to look up pretty much everything every time I wanted to use it... but that's just how you learn, you know. Note that I did go out of my way to explain everything in detail. – Nyerguds Apr 27 '18 at 05:41
0

Assuming you had an arbitrary number of points, you could use something like this. It's almost certainly not very efficient, but unless you're processing heaps of them it probably doesn't matter:

const string s = "AT (35,200) ADDROOM RM0001";

var numbers = s.Split(' ')[1]
               .Replace("(", string.Empty)
               .Replace(")", string.Empty)
               .Split(',').Select(int.Parse);
Evan Trimboli
  • 29,900
  • 6
  • 45
  • 66