3

I'm in the process of creating a markup editor in Objective C. I require the following functionality:

  • Recognise the demarcation of a block eg **block**
  • Delete the start and end "tags" eg "The next text is **bold**" becomes "The next text is bold"
  • Determine the start and end positions of the marked-up text in the new context: "The next text is bold"

Edit: As I may expand the syntax in the future (it will be very limited at the moment), it is important that parsing be top-down such that the start and end positions of the text always correspond with the resulting text. For this reason regex may not be the best solution.

What is the best way to do this?

RaelG
  • 173
  • 1
  • 4
  • 13
  • Why do you think regex isn't the right solution? In my opinion it's like 'the only solution' to create a well-working format-parser. You can use some sample-code about bb-codes to create a wiki-variant. To give you a little example: text: \*\*bold\*\*, regex: \\*\\*([^\\*]+)\\*\\*, results in $1=bold. – cutsoy Aug 22 '10 at 10:38
  • I guess you are right. I was thinking that if I match a bold block then an italic block then my indexes would be wrong, but I could just subtract the difference in characters depending on start position. – RaelG Aug 22 '10 at 10:52
  • You certainly do not need regular expressions to do this. You can also use a tool like ANTLR or Bison to get a parser going for this. It gets complicated quickly, how would you parse `**3*5**` with your regular expression? (Valid here on SO) – Stefan Arentz Aug 22 '10 at 15:15

2 Answers2

0

In the end went for regex approach using RegexKitLite

The code below is not fully tested but does work with the case St3fan pointed out.

- (NSArray *) scanContent:(NSMutableString **)content {
    NSMutableArray *tokens = [[NSMutableArray alloc] init];

    NSArray *captureRegex = [[NSArray alloc] initWithObjects:
                             @"\\[\\[(.*?)\\]\\]",@"\\*\\*(.*?)\\*\\*", nil];

    NSArray *tokenID = [[NSArray alloc] initWithObjects:
                        @"Italic",@"Bold", nil];

    int index = 0;

    for (NSString*capture in captureRegex) {

        NSRange captureRange;
        NSRange stringRange;
        stringRange.location = 0;
        stringRange.length = [*content length];

        do {
            captureRange = [*content rangeOfRegex:capture inRange:stringRange];
            if ( captureRange.location != NSNotFound ) {

                NSMutableDictionary *dictionary = [[NSMutableDictionary alloc] init];
                [dictionary setObject:[tokenID objectAtIndex:index] forKey:@"Token"];

                [dictionary setObject:[NSNumber numberWithInt:captureRange.location]
                               forKey:@"Start"];
                [dictionary setObject:[NSNumber numberWithInt:captureRange.length]
                               forKey:@"Length"];

                [tokens addObject:dictionary];

                for (NSMutableDictionary *dict in tokens) {
                    NSNumber *nRange = [dict objectForKey:@"Start"];
                    int start = [nRange intValue];

                    if (start > captureRange.location) {
                        nRange = [NSNumber numberWithInt:start - 4]; // Removing 4 characters 
                        [dict setObject:nRange forKey:@"Start"];
                    }

                    if (start == captureRange.location) {
                        NSString *data = [*content stringByMatching:capture options:RKLMultiline inRange:captureRange capture:1 error:NULL];                
                        NSLog(@"data: %@",data);
                        [*content replaceOccurrencesOfRegex:capture withString:data range:captureRange];
                        NSLog(@"Replaced Content: %@",*content);
                    }
                }

                stringRange.location = captureRange.location + captureRange.length -4;
                stringRange.length = [*content length] - stringRange.location;
            }
        }
        while ( captureRange.location != NSNotFound );

        index++;
    }
    return tokens;
}
RaelG
  • 173
  • 1
  • 4
  • 13
  • 1
    based on code in http://stackoverflow.com/questions/1634012/how-to-parse-some-wiki-markup by Blaenk – RaelG Aug 22 '10 at 23:13
0

MarkDown Sharp, the markdown processor used on the StackExchange websites, is open source. Take a look at the file, perhaps you can see how they do it or port it to objective-c.

Perhaps better yet, take a look at this question: "What is the simplest implementation of Markdown for a Cocoa application?"

It links to an open source application called MarkdownLive which uses a C implementation of Markdown called discount, and also provides an objective-c wrapper for it.

Community
  • 1
  • 1
Jorge Israel Peña
  • 36,800
  • 16
  • 93
  • 123