0

I am new to iPhone.I have a small doubt in regular expressions that at present i am using regular expression below one in my project that is

NSRegularExpression *regularExpression = 
   [NSRegularExpression regularExpressionWithPattern:@"href=\"(.*).zip\"" 
                                             options:NSRegularExpressionCaseInsensitive 
                                               error:&error];

it searches the website viewsource and gives results which are in below pattern

href="kjv/36_Zep.zip"
href="kjv/37_Hag.zip"

but one of the link in view source is like below

href="kjv/38_Zec.zip        "

i want to ignore the white spaces after the .zip how it is possible if any body know this please help me

The Lazy Coder
  • 11,560
  • 4
  • 51
  • 69
user1531844
  • 21
  • 1
  • 3
  • [Parsing HTML/XML with a Regex will never end well](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Consider using an HTML or XML parser instead and extract that attribute. – Adam B Aug 09 '12 at 08:23
  • one thing to note. If the web page has a url with spaces on the end. the browser will include %20 or + for each space in the url. for example kjv/38_Zec.zip++++++++ would be the url for the last one in your example which is a UTF8 Encoded version of the URL – The Lazy Coder Aug 09 '12 at 19:56

4 Answers4

1

One way is to do a string replace of all whites spaces with the empty string or use a strip function on that string to remove all trailing spaces. Refer String replacement in Objective-C

If you don't want to do that, use the pattern for empty space in your regular expression to match one or more white spaces.

\s includes \n(ewline) \r(eturn) \t(tab) \v(ertical tab) \f(orm feed) and space. If you want only space use "" which is actually a blank space.

Community
  • 1
  • 1
Pratik Mandrekar
  • 9,362
  • 4
  • 45
  • 65
1

You can match the examples you provided with the following regex...

@"href=\"(.+)\.zip\s*\""

I modified your regex by adding

1) + (matches 1 or more of the preceding character) to capture the entire name before the .zip, 2) \ to the . to prevent it from matching all characters, 3) \s* to match (skip in your case) zero or more whitespaces.

0x141E
  • 12,613
  • 2
  • 41
  • 54
0

Suppose its given a NSString *test = @"...href="/functions?q=KEYWORD\x26amp... " and you want to perform actions on this string with NSRegularExpression, you could also do easy method call like this

NSTextCheckingResult *result = [testRegex firstMatchInString:[test stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] options:0 range:NSMakeRange(0, [test length])];

And dont change anything in your NSRegularExpression.

Amit Singh
  • 8,383
  • 4
  • 28
  • 31
0

I commonly use groups to gather the item I want. However you need to know how groups work.

Unfortunately You cannot name them. but think of it this way.

groups are indexed with numbers for the () encountered.

0 is the entire match.

1 is the first set of ()

2 is the second set of () and so on.

if you have a group set like this.

NSString *matchString = @"(href)=\"((.*)[.]zip)\"";

you would have 4 groups.

Group 0 is the entire string, Group 1 is the "href", Group 2 is the entire filename and group 3 would be the filename without the extension.

Hope that helps.

NSRegularExpression *regularExpression = 
   [NSRegularExpression regularExpressionWithPattern:@"href=\"(.*[.]zip)[^\"]*\"" 
                                             options:NSRegularExpressionCaseInsensitive 
                                               error:&error];

NSMutableArray *foundMatches = [NSMutableArray array];

[regex enumerateMatchesInString:originalString 
                        options:0 
                          range:NSMakeRange(0, [originalString length]) 
                     usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
                         if (result.numberOfRanges == 2){
                             [foundMatches addObject:[originalString substringWithRange:[result rangeAtIndex:1]]];
                         }
                     }];

the match I used here would mess up in the event there is a .zip in the filename that does not include the extension.

e.g. href="my.zip.file.zip" would put match group 2 would be "my.zip" as opposed to "my.zip.file.zip"

The Lazy Coder
  • 11,560
  • 4
  • 51
  • 69