0

I am trying to scan for hashtags from NSStrings in Objective-C and I am using regex. I made a test status on Facebook to see what are the valid hashtags as it is where I want to pattern my hashtag detection with. My problem is that my regex is still getting hashtags that are not preceded by a space or comes before an alphanumeric. In somethin#idfsjoa the #idfsjoa is being detected as a hashtag when it shouldn't be. I am using regexpal to test my regex.

How do I check if the "#" does not come after a space"?

From Facebook:

enter image description here

The NSString:

#face #Fa!ce something #iam#1 #1 #919 #jifdosaj somethin#idfsjoa #9#9#98 9#9f9j#9jlasdjl #jklfdsajl34 #34239 #jkf #a #1j3rj3

The regular expression I currently have:

(?!\w+)#(\w+)([A-Za-z0]+)
SleepNot
  • 2,982
  • 10
  • 43
  • 72
  • What is the expected results from your string? – hwnd Oct 16 '14 at 02:49
  • Where's the Obj-C code you're using? – l'L'l Oct 16 '14 at 02:51
  • I didn't put the objective C code I am using since I only need help on the regex statement. I only put Obj-C on the tag to let people know I am using regex with iOS. @l'L'l – SleepNot Oct 16 '14 at 03:03
  • You were too quick on the correct answer; I had an obj-c solution, but didn't have a chance to post. Also on the last hashtag in your example, is it supposed to match? – l'L'l Oct 16 '14 at 03:30
  • It is because I can already get the matching strings from the checking results in Objective-C. My only problem was the regex formula as my own formula still gets an invalid hashtag. You may post your solution still if you want to share so other people can see it someday if they drop by this question, it might help other people. – SleepNot Oct 16 '14 at 04:39

1 Answers1

0

This seems to match your criteria:

(?:\s|^)(#(?:[a-zA-Z].*?|\d+[a-zA-Z]+.*?))\b

Note that the hashtag itself will be the first (and only) capture.

RegexPal screenshot

jmar777
  • 38,796
  • 11
  • 66
  • 64
  • Awesome! Is there a way to remove the spaces on the start of the detected hashtags? – SleepNot Oct 16 '14 at 03:05
  • The space is technically "matched" by the regex, but the only the hashtag itself is included in the capture (`(#(?:[a-zA-Z].*?|\d+[a-zA-Z]+.*?))`) part of the expression. I'm honestly not an Objective-C programmer so I can't tell you the exact approach, but virtually *all* regex engines will let you access captures within the match... which is what you'd want here. – jmar777 Oct 16 '14 at 03:09
  • I think this will do. I'll just trim the spaces on the hashtags. – SleepNot Oct 16 '14 at 03:09
  • Looks like you want something like this: http://stackoverflow.com/a/9276827/376789 – jmar777 Oct 16 '14 at 03:10
  • The statement above could not seem to match "world" in "#hello!#world" which I think should be a valid hashtag. Any suggestion? – SleepNot Oct 16 '14 at 06:36