1

I'm trying to parse Obj-c source code with regex. I want to find both declarations and implementations.

First I look for classes like this:

@implementation(.|\n)+@end

@interface(.|\n)+@end

Then I have these patterns for finding instance-methods inside the ranges of the classes:

For interface:       -.*;
For implementation:  -.*{

However things from: comments, strings and math operations are also included.

Examples where these patterns fail:

//I'm pretending to-be an instance method;

/*
Disabled methods:
- (void)myProgrammerDidntLikeMe;
*/

if (a + b == 2) { ... }
str = @"-----";

How can I make the patterns exclude these pretending-to-be methods, and is there something else I have not foreseen?

Update: When experimenting with parsing a single a method-string I noticed my pattern also worked for identifying them. This is what I came up with:

(-|\+)\s\(([\w|\*|\s]*)\)(?:(?:(?:(\w*)(?:\:\([\w|\s|\*]*)\)(\w*)\s*){1,}))?(\w*)

However it does not, unlike my first attempt find methods without a return type. But I'm okay with that since I have never ever seen one being used.

- noReturnType

I doesn't know anything about comments and ifs, but 1) it's harder too fool with for instance math operators and 2) It also parses the method itself.

Now I'm mutating my question a bit, but I'm trying to achieve capture-group-output like this, which I don't know how to.

1. -
2. void
FOLLOWING_CAN_REPEAT
3. setFoo:
4. Foo*
5. foo
END_REPEAT
Johannes Lund
  • 1,897
  • 2
  • 19
  • 32
  • 2
    If you want to do this reliably, you can't do it with regex. What are you going to do about #if? macros? Header files? People keep learning this lesson: regex cannot be used for reliable processing of computer languages. If you don't mind making errors occassionally, you already have a solution that works (kind of); you can patch this to your hearts content and it will improve but always have a fundamental flaw. If you need a real Objective C parser, I have another answer. – Ira Baxter Jul 26 '12 at 17:48
  • 1
    "I'm trying to parse Obj-c source code with regex." This sounds like a bad idea to me. Are you sure it's the best option? –  Jul 26 '12 at 17:48
  • 1
    For some advice on parsing code with regex: http://stackoverflow.com/q/1732348/1487063 (my favorite SO question). – Dustin Jul 26 '12 at 17:52
  • @IraBaxter Thanks! You're right, regex is probably not the best option, and what I'm really looking for a real Obj C parser. I have decided to create a new question for that: http://stackoverflow.com/questions/11675661/parse-tokenize-objective-c-with-objective-c – Johannes Lund Jul 26 '12 at 18:32
  • However, I'm not too confident about finding a real parser for iOS. So before I see one is this the best solution. I don't think it is impossible to do this with regex as long as you don't bring water to your head. And I feel like it won't matter if it gets it wrong in preprocessor (#) statements – it don't have to be completely perfect just as perfect as possible without too much difficulty. – Johannes Lund Jul 26 '12 at 19:59
  • You do know that a method doesn't need to specify the return type, correct? – Richard J. Ross III Jul 26 '12 at 23:40
  • @RichardJ.RossIII yes, I do. All right, it would be stupid of me to require return type, but where would you not specify it? – Johannes Lund Jul 27 '12 at 01:13
  • 1
    @JohannesLund legacy headers include them, because the first versions of objc didn't have support for return types. They are also useful in protocols, where you don't care what is returned from a method. They can also be used as a form of generics, but that's finicky at best. – Richard J. Ross III Jul 27 '12 at 01:57

2 Answers2

1

If you want to do this reliably, you can't do it with regex.

What are you going to do about #if? macros? Header files? People keep learning this lesson: regex cannot be used for reliable processing of computer languages.

If you don't mind making errors occasionally, you already have a solution that works (kind of); you can patch this to your hearts content and it will improve but always have a fundamental flaw.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
1

I would use something like Yacc or Bison, fed with the Objective-C language grammar, to create an efficient c source parser I can hook into.

Michael
  • 1,213
  • 6
  • 9