0

So I get a string result from a system like this, which I have to capture all the hex parts, excluding the 0x:

[System Info] 2.20.02 2.20.02 - Extended Data: 
0xAC, 0x4D, 0xDE, 0x04, 0xA4, 0x10, 0x73, 0x89, 0xDF, 0xFF, 0x01, 0x01, 0x01, 0xDF, 0x5A, 0x10, 
0x34, 0x37, 0x35, 0x36, 0x33, 0xC1, 0x10, 0x2A, 0x2A, 0x2A, 0x2A, 0x2A, 0x37, 0x38, 0x31, 0x32, 
0x9F, 0xDD, 0x01, 0xB5, 0x42, 0x03, 0x45, 0x56, 0x33, 0x2F, 0x02, 0x06, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x15, 0xA3, 0x21, 0x03, 0x09, 0x51, 0x09, 0x9A, 0xE5, 0x16, 0x12, 0x21, 0x9F, 0x34, 0x03, 
0x03, 0x1E, 0x03, 0xCE, 0x04, 0x00, 0x12, 0x00, 0x00, 0xDF, 0xFF, 0x02, 0x01, 0x1A,

I have created a function which can help me extract substrings into an array:

+ (NSArray *) regexPattern:(NSString *)pattern toExtract:(NSString *)string{
    NSError *error;
    NSRegularExpression * regexp = [NSRegularExpression regularExpressionWithPattern:pattern
                                    options:NSRegularExpressionCaseInsensitive error:&error];
    if (error == nil) { return nil; }
    NSMutableArray * matches = [[regexp matchesInString:string options:0 range:NSMakeRange(0, [string length])] mutableCopy];
    [matches removeObjectAtIndex:0]; // because it contains all the string.
    NSMutableArray * result = [[NSMutableArray alloc] init];
    for (NSTextCheckingResult * match in matches) {
        [result addObject:[string substringWithRange:[match range]]];
    }
    [matches release];
    return result;
}

But now the problem is the regex. I have tried to use capture group () to capture only the hex part using this pattern: 0x(..),. This pattern capture the whole 0xFD, instead of just FD. If I use ([\dA-F]){2}, I can get all the hex, but I also capture 20 and 02 from 2.20.02 2.20.02, which I don't want to. Some website told me that I will only get the data between the capture brackets, but that's not the case. Can somebody help? Thanks.

Chen Li Yong
  • 5,459
  • 8
  • 58
  • 124

2 Answers2

3

In short, don't. Regular expressions are really useful, but not for such a well defined, simple, set of input.

See the top answer here for an explanation: RegEx match open tags except XHTML self-contained tags

Instead, use NSScanner. It is quite adept at scanning hex strings and skipping characters as needed. It'll be faster and more sane (the problem with regular expressions is that the fuzzy nature of the matching yields a parser that can often be easily spoofed, confused, or hacked by purposefully mal-constructed input).

This is a pretty good starting point:

Objective-C parse hex string to integer

I'd start by finding the "Extended Data:", then use the scanner to skip the 0x, then scan to parse a hex #, then use the scanner to skip the ", 0x", etc...

Community
  • 1
  • 1
bbum
  • 162,346
  • 23
  • 271
  • 359
  • did you mean "regular expressions are really useful, *especially* for such a well defined, simple set of input" ? – Chen Li Yong Dec 21 '16 at 07:23
  • @ChenLiYong Nope. Regular expressions are a giant pain in the ass in all contexts(1). Where they are very useful is in dealing with relatively unstructured input where you need to fuzzy match to pull subsets of data. For *well defined structured input* you should pretty much never use a regex vs. something as simple as NSScanner, state machine, or a proper parser. (1)I've used, and continue to use, regular expressions often. But not for tasks like this. – bbum Dec 21 '16 at 16:21
  • Oh I see. I'll start to read about the `NSScanner` then. I've never heard of it before. I think from your explanation, `NSScanner` works similar to state machine or something. Thanks. – Chen Li Yong Dec 22 '16 at 02:37
1

You can use 0x(..), as your regular expression, but when you are iterating through the matches, instead of using substringWithRange:[match range] in [result addObject:[string substringWithRange:[match range]]]; which adds the entire matching string portion, you need to just add the first group (portion in parenthesis)

You could do it like this

for (NSTextCheckingResult * match in matches) {
    NSRange groupRange = [match rangeAtIndex:1];
    [result addObject:[string substringWithRange:groupRange]];
}
Hasan
  • 200
  • 2
  • 12
  • Wait, so you're saying that the capturing groups in the result is present on the `rangeAtIndex` ? So if I have 10 capturing groups in one regex syntax, I will be able to get each of the capturing group's content using that? Oh I see! – Chen Li Yong Dec 22 '16 at 02:35