8

I'm not sure that regexes are the best solution here, but they seem logical enough; I'm just not sure how to actually implement this.

Basically, I want my user to be able to type in a method name and have it parsed for validity after each character input. I have defined a valid function format as a regex, and it is easy enough to test if the two match. The trouble is, I would like to be able to do a partial match, so as to let the user know, "So far, this is valid".

For example,

+(NSString *)thisIsAValid:(ObjectiveC *)method;

is a valid method. It can be matched by a simple regex string like

[\+-]\(w+\s*\*?\)\w+....etc...

. But I would love to be able to have that same regex string "match"

+(NSStr

(which I realize is sort of a backwards way of using regex). I would still not want the regex to match

Q)(NStr

Is there a way to implement something like this with standard regex functions, or would I have to do something more drastic?

Thanks so much! You guys are invaluable.


After further thinking, I suppose I could make my question a bit more clear (as well as a bit more succinct): Normally, one uses a regex to look for a pattern in text. I.e., how many times does "cat" or "cot" appear in this paragraph. I wish to do the opposite, look for a string "in" a regex. That is, how much of this string, starting from the beginning matches this regex. Ultimately, I would like to return the index at which the string ceases to match the regex in question.

Go Dan
  • 15,194
  • 6
  • 41
  • 65
Plastech
  • 757
  • 6
  • 17
  • It seems like your real goal is to have a regex for a pattern that is *definitely invalid*, and then to trigger an error when that pattern is matched. Thus you might want to look into matching the complement of the language. Note that the complement of a regular language may or may not be regular! – Platinum Azure Jan 11 '12 at 17:39
  • Along those lines, I suggest trying to find a good description of the Objective-C grammar (probably a context-free grammar) and see how much you can match against the complement of the method declaration rule with regex. – Platinum Azure Jan 11 '12 at 17:50
  • +1 for an interesting question though! :-) – Platinum Azure Jan 11 '12 at 17:55
  • Good suggestions, though I don't know if they meet my ultimate goal: to tell how much of the regex is matched by the string. I'll need to go into my man cave and think on these new possibilites for a few :) – Plastech Jan 11 '12 at 20:30
  • Ah, now that I see your edits and your ultimate goal, a slightly different approach comes to mind. Figure out a regex for each component of the string, then match each individually, keeping track of how much of the string is matched each time, and on the first failure return the index where the patterns stopped matching. In other words, you're probably going to have to write a little "state machine" (composed of patterns represented by regular expressions). Unfortunately, I don't know Objective-C at all and so can't comment on the grammar or a specific approach. Maybe I'll draw up a C answer. – Platinum Azure Jan 11 '12 at 20:46
  • That sounds reasonable. I'm using a framework called RegexKit that allows for named captures, which would only make the task easier. – Plastech Jan 11 '12 at 21:16
  • Platinum Azure, thank you for your help. I believe I have a solution now. I store each individual chunk of the regex in an array: NSArray *regex = [NSArray arrayWithObjects: @"\s*", @"[\+-]", .... Then as you said, I will use RegexKit built in functions to get the first matched range. Then, I can use the next item in the regex array from the index of the last match. Rinse, repeat, and sum. – Plastech Jan 11 '12 at 21:20
  • Ah no, that fails on grouped sections. :( – Plastech Jan 11 '12 at 21:34
  • what you really want is the code that goes into the latest IDE. You cannot really do this without knowing parts of the AST (abstract syntax tree). If you go to solve this with regexp's, you will get deeper and deeper in trouble - much like quicksand. Eclipse does support this kind of feature (using the environment to suggest code completion for instance) even within views other than the editor. So if your IDE is open source, or has extension points like that of Eclipse, I would suggest to go in that direction. – Maarten Bodewes Jan 16 '12 at 00:54
  • hi, i expect you'll be better off by having a regexp for a complete declaration converted into a d/nfa beforehand simulating it as the user types in the declaration. however, be aware that this approach is limited as 1) pure regexes do not validate all common programming language constructs (e.g. expressions), 2) your common regex engines are actually much more powerful recognizers. declarations without initialisations and type check should come out fine but will probably be of limited use to your audience. regards – collapsar Jan 17 '12 at 13:57

2 Answers2

1

RegexKit seams to use PCRE so you could try to use PCRE directly as it has support for partial matches.

#include <stdio.h>
#include <pcre.h>

void test_match(pcre *re, char *subject);

void test_match(pcre *re, char *subject) {
  int rc = pcre_exec(re, NULL, subject, (int)strlen(subject),
                     0, PCRE_PARTIAL, NULL, 0);
  printf("%s: %s\n", subject,
         rc == 0 || rc == PCRE_ERROR_PARTIAL ?
         "match or partial" : "no match");
}

int main (int argc, const char * argv[]) {
  pcre *re;
  const char *errstr;
  int erroffset;

  re = pcre_compile("^ABC$", 0, &errstr, &erroffset, NULL);
  test_match(re, "A");
  test_match(re, "AB");
  test_match(re, "ABC");
  test_match(re, "ABD");

  return 0;
}

Will print:

A: match or partial
AB: match or partial
ABC: match or partial
ABD: no match
Mattias Wadman
  • 11,172
  • 2
  • 42
  • 57
0

I would just write a script to take the regex and split it up so that the regex ABCD becomes the new regex: ^(ABCD|ABC|AB|A)$

Parenthesis will be tricky so avoid them if possible. Otherwise you have to account for that probably by just adding the correct amount of closing parens. Also you'll have to tokenize the ABCD string by regex operator so you don't split up something like \*+

Dan
  • 736
  • 8
  • 12