6

In a sentence similar to:

Lorem ipsum +dolor ++sit amet.

I'd like to match the +dolor but not the ++sit. I can do it with a lookbehind but since JavaScript does not support it I'm struggling to build a pattern for it.

So far I've tried it with:

(?:\+(.+?))(?=[\s\.!\!]) - but it matches both words
(?:\+{1}(.+?))(?=[\s\.!\!]) - the same here - both words are matched

and to my surprise a pattern like:

(?=\s)(?:\+(.+?))(?=[\s\.!\!])

doesn't match anything. I thought I can trick it out and use the \s or later also the ^ before the + sign but it doesn't seem to work like that.


EDIT - background information:

It's not necessarily part of the question but sometimes it's good to know what is this all good for so to clarify some of your questions/comments a short explanation:

  • any word in any order can by marked by either a + or a ++
  • each word and it's marking will be replaced by a <span> later
  • cases like lorem+ipsum are concidered to be invalid because it would be like splitting a word (ro+om) or writing two words together as one word (myroom) so it has to be corrected anyway (the pattern can match this but it's not an error) it should however at least match the normal cases like in the example above
  • I use a lookahead like (?=[\s\.!\!]) so that I can match words in any language an not only \w's characters
t3chb0t
  • 16,340
  • 13
  • 78
  • 118
  • did you want to match `+bar` in `foo+bar` ? – Avinash Raj Jan 14 '15 at 11:37
  • No, it is a sentence and there won't be such cases. There will always be either a space `\s` or `^` before the `+`. – t3chb0t Jan 14 '15 at 11:42
  • then why you accepted the answer which captures `+bar` in `foo+bar`? – Avinash Raj Jan 14 '15 at 11:45
  • Because a `foo+bar` would be a typo and needs to be corrected anyway. It would be the same as if I wrote _myroom_ instead of _my room_. – t3chb0t Jan 14 '15 at 11:47
  • @AvinashRaj as you probably know it's not always easy or obvious which answer to accept like in this case. I picked @TimPietzcker's answer because although it isn't perfect (like you've said it matches too many cases) it explains the trick to match only one `+` before a word. After all I mixed his and @hsz's answer with my own pattern and solved it. We need another option like _partial answer_ ;-) – t3chb0t Jan 14 '15 at 12:34

5 Answers5

3

One way would be to match one additional character and ignore that (by putting the relevant part of the match into a capturing group):

(?:^|[^+])(\+[^\s+.!]+)

However, this breaks down if potential matches could be directly adjacent to each other.

Test it live on regex101.com.

Explanation:

(?:         # Match (but don't capture)
 ^          # the position at the start of the string
|           # or
 [^+]       # any character except +.
)           # End of group
(           # Match (and capture in group 1)
 \+         # a + character
 [^\s+.!]+  # one or more characters except [+.!] or whitespace.
)           # End of group
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • It's hard to pick an answer but I think this one lead me to the right direction and I'll go with a pattern like `(?:^|\s)(\+([^+\s]+))(?=[\s\.!\!])` which is kind of combination of both patterns. Matching the additional character at the beginning did the trick. I've added another group because I need to replace it later so it solves this particular problem. – t3chb0t Jan 14 '15 at 11:41
3
\+\+|(\+\S+)

Grab the content from capturing group 1. The regex uses the trick described in this answer.

Demo on regex101

var re = /\+\+|(\+\S+)/g;
var str = 'Lorem ipsum +dolor ++sit ame';
var m;
var o = [];

while ((m = re.exec(str)) != null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }

    if (m[1] != null) {
        o.push(m[1]);
    }

}

If you have input like +++donor, use:

\+\++|(\+\S+)
Community
  • 1
  • 1
vks
  • 67,027
  • 10
  • 91
  • 124
  • 1
    Whoever did that obviously doesn't know the difference between capturing and non-capturing matches... @t3chb0t I think you should rather accept this answer as it doesn't suffer from problems with adjacent matches like `+dolor+sit` (where mine would only find `+dolor`). – Tim Pietzcker Jan 14 '15 at 12:20
  • @TimPietzcker he has to replace those captured with something else.So urs might be better replacing in this will require more effort. – vks Jan 14 '15 at 12:21
1

The following regex seems to be working for me:

var re = / (\+[a-zA-Z0-9]+)/  // Note the space after the '/'

Demo

https://www.regex101.com/r/uQ3wE7/1

Vivendi
  • 20,047
  • 25
  • 121
  • 196
1

I think this is what you needed.

(?:^|\s)(\+[^+\s.!]*)(?=[\s.!])
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Just try with following regex:

(^|\s)\+\w+
hsz
  • 148,279
  • 62
  • 259
  • 315