3

This is somewhat related to: Regular Expression - Formatting text in a block - IM but a different problem.

Looking for -'s wrapping text with the following conditions:

Conditions:

  • token can be at start or end of line
  • token must be surround by space or one or more symbols: {.,!@#$....}.
    • must not be a normal character [a-zA-Z] surrounding the - pair in question.
    • See Sample test 3 ...w-thank you-
    • Test 4 and 5 succeed because the - is wrapped with [^a-zA-Z]
  • token must not be followed by a space on the first - or a space preceding the last -
    • "-Wow -" will not be a match as the closing - was preceded by a space.
    • See Sample test 6 and 7

For the front of the regular expression I would need: (^|[\s\W]+)
and the end would be: ($|[\s\W]+)

I have the current expression, but it is failing due to the escape condition being stop after finding the first -

   (^|[\s\W]+)-([^\s][^-]*)-($|[\s\W]+)

Sample test strings would be:

  1. (all.): -Wow-thank you-.
  2. (Wow): -Wow- thank you-!
  3. (NIL): - Wow-thank you-.
  4. (thank you): - Wow!-thank you-
  5. (thank you): - Wow -thank you-
  6. (all): -Wow - thank you-
  7. (NIL): -Wow - thank you -

Does this require look behind? (I'm a regex newbie so please bear with me) Or is my middle condition totally wrong.

Thank you much!
mwolfe.

Community
  • 1
  • 1
Mike Wolfe
  • 314
  • 1
  • 10
  • I don't understand your 1st comment after your 2nd condition. – Rohit Jain Mar 08 '13 at 06:52
  • 1
    Example 3 fails by this condition because there is a character "w" before the "-thank you-". If that character was a space or a symbol then "-thank you-" would have been identified/flagged/found. That is why Example 4 work because the character before the "-" was a "!" – Mike Wolfe Mar 08 '13 at 06:56
  • 1
    +1 - Very nice attempt, especially for a (self-proclaimed) newbie. – Andrew Cheong Mar 08 '13 at 07:16
  • 1
    Some unrelated pointers, though you may already know. (1) Beware that the `\w` class includes the underscore and numbers! It may be better to use `[a-zA-Z]` or `[a-zA-Z0-9]`. (2) You can invoke case-insensitive matching by using the `/.../i` modifier, thereby only needing to write `[a-z]` or `[a-z0-9]`. (3) Beware that the universe of characters isn't limited to letters, numbers, and symbols. You say the tokens have to be surrounded by a space or one or more symbols. But then you say, as if equivalently, that it must not be surrounded by `[a-zA-Z]`. These are not necessarily the same. – Andrew Cheong Mar 08 '13 at 07:23

1 Answers1

1

Try a simpler middle expression.

(^|[\s\W]+)-(.*?)-($|[\s\W]+)
             ^^^

The non-greedy wildcard match would capture the minimum string necessary to match the following -($|[\s\W]+).


Edit. Okay, I see why that's wrong. You want a non-space character to immediately follow and succeed the opening and closing dashes, respectively. So try this:

(^|[\s\W]+)-(\S.*?\S)-($|[\s\W]+)
             ^^   ^^
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145