0

I would like to compose a regular expression to highlight keywords.

The regex is kind of like

\btap\b.

And for below sentence, it's expected to match only one "tap" without double quotation. But in reality, it also match the second "tap" within quotation symbol.

tap click "tap"

How can I exclude the second tap word from being matched?

HamZa
  • 14,671
  • 11
  • 54
  • 75
Chuck
  • 1,508
  • 4
  • 15
  • 19
  • Are you trying to match phrases with only _one_ `tap`, or are you looking for a regex which matches the _first_ `tap` ? – Tim Biegeleisen Nov 23 '15 at 08:59
  • 1
    What language/tool are you using? – HamZa Nov 23 '15 at 08:59
  • You may use lookarounds to disallow quotes around the word: [`(?<!")\btap(?!")\b`](https://regex101.com/r/oJ0vG4/1). – Wiktor Stribiżew Nov 23 '15 at 09:00
  • @stribizhev the problem with that is that it won't match `"tap` or `tap"`. The next question would be: is that a problem? – HamZa Nov 23 '15 at 09:01
  • i am trying to match all the tap without double quotation...In my case, only tap should be matched – Chuck Nov 23 '15 at 09:01
  • Then the regex by @stribizhev should do the job. But you should also consider other types of quotes, punctuation (e.g. end of sentence `tap.`) etc. – Tim Biegeleisen Nov 23 '15 at 09:02
  • i am using javascript – Chuck Nov 23 '15 at 09:02
  • for example, if the sentence is tap click "tap" tap, then two tap word should be matched – Chuck Nov 23 '15 at 09:04
  • @Chuck: That should have been added when you posted. As the regex tag info states, all questions with this tag should also include a tag specifying the applicable programming language or tool. Use `.replace(/(^|[^"])\b(tap)(?!")/g, "$1$2")` ([regex demo](https://regex101.com/r/xV1bY7/1)). – Wiktor Stribiżew Nov 23 '15 at 09:04
  • One trick would be to match `("?)\btap\b\1` and check if group 1 is empty or not. – HamZa Nov 23 '15 at 09:06
  • 1
    `exclude the second tap word from being matched` ... in the above comment you implied you want to _include_ the two occurrences. Can you please update your question with exactly what you are trying to do the JS code. – Tim Biegeleisen Nov 23 '15 at 09:07
  • 1
    Be cautious with `\b`. JavaScript regexes do not use the Unicode definition of "letters", only ASCII. So `\b` will match the string "tap" if it occurs between non-ASCII alphabetic characters. Unicode support is only coming in ES6 with the `u` flag (not yet implemented in browsers). – Touffy Nov 23 '15 at 09:27
  • 1
    One alternative that I always use is defining custom delimiters that behave similar to a word boundary: `'tap click "tap"'.replace(/(^|[^\w"-])tap/g, '$1XXX')` – hwnd Nov 23 '15 at 09:45

2 Answers2

0

This seems working fine.

var reg = new RegExp('\\b(tap(?!\"))', 'ig')

('tap click "tap" tap.').match(reg)

Rules

  1. Starting word

  2. not quotes at end

  3. case insensitive.

Fiddle

Stark Buttowski
  • 1,799
  • 2
  • 10
  • 21
  • I finally work this pattern for my requirement based on your answer. \b((?<!\")tap(?!\"))\b – Chuck Nov 24 '15 at 01:44
0

Word boundaries \b matches any non-word character (so the " also).

You can simulate your own word boundaries where to include only what you think is appropriate.

In example:

\s|^|\.|!|\?|$ - space or start of string, or dot, or exclamation mark, or question mark, or end of string

I would also suggest to use negative lookbehinds/-aheads but...

Javascript doesn't support lookbehinds

So you could use some capturing groups and then use the group which you need.

Sample regex: (?:\s|^|\.|!|\?)(tap)(\s|$|\.|!|\?)

And then in the javascript use the first capturing group - match[1].

See this SO answer for details how to use capturing groups in JavaScript.

Community
  • 1
  • 1
StoYan
  • 255
  • 2
  • 10