-2

I've been trying to do this forever. I can match first letter of every word, but I can't exclude words which are in braces.

For example:

I can't (do) this, please (help) me.

So this should match - I, c, t, p, m - only.

Using \b\w only matches first letters of the word, it doesn't exclude words in braces. I've tried also negative lookahead, but seems like I can't do it properly:

(?!\(()\))\b\w

Also I've got the problem with the unicodes. Using (?:^| )[a-z]{1} or \b\w only matches latin letters and I sometimes will have different unicodes, for example:

I am (someone) ვიღაც.

And in this situation regex will only match I, a and s, not . Thanks

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
Messing
  • 77
  • 7
  • 4
    Try: `"I can't (do) this, please (help) me".match(/(?:^| )[a-z]{1}/gi)`. Not tested though. –  Jul 04 '16 at 14:05
  • 1
    Please note: To avoid down votes show what you've tried so far –  Jul 04 '16 at 14:08
  • Thanks! It works, but it also matches spaces before letters. "I", " c", " t", " p" ... – Messing Jul 04 '16 at 14:18
  • 2
    OK, when you want to match non-latin chars too, then you shouldn't use `RegExp`, because even in ES6 there's no support for unicode char classes. –  Jul 04 '16 at 14:30
  • Alright, thanks, didn't know that. So I should match one char after space. – Messing Jul 04 '16 at 14:33

2 Answers2

1

this one catch only the first letter of words :

(?<=[^(])\b\w

this is a positive lookbehind : ( from https://regex101.com/)

Ensures that the given pattern will match, ending at the current position in the expression. Does not consume any characters.

/(?<=foo)bar/

foobar match foobaz don't match

For non-latin caracters i can't help you

baddger964
  • 1,199
  • 9
  • 18
  • Thanks for the answer, positive lookbehind isn't supported in javascript. I've forgot to mention about the js in OP, but this question is tagged with javascript. – Messing Jul 04 '16 at 15:08
1

Different things to be considered.

  1. First you need to define your letters that can also be non-latin ones. See this answer and comments. So to match a letter let's use [\u00C0-\u1FFF\u2C00-\uD7FF\w]

  2. As you want to do this in Javascript, regex is limited. A word boundary \b cannot be used as it does not match the specified letter range. Lookbehind is not available. We need to use a negated class of the specified letter. Something like (?:^|[^'\u00C0-\u1FFF\u2C00-\uD7FF\w-]) as a "word boundary". Here I also added ' to avoid matches in such as can't

  3. Use a lookahead for checking to be outside of parenthesis: (?![^(]*\))

All together the pattern would look like

(?:^|[^'\u00C0-\u1FFF\u2C00-\uD7FF\w])([\u00C0-\u1FFF\u2C00-\uD7FF\w])(?![^(]*\))

See this fiddle and demo at regex101

Community
  • 1
  • 1
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • Great `RegExp` stunt! Not very reliable though. Probably too many exceptions: `"code-breaking text".match(/(?:^|[^'a-z\w])([a-z\w])(?![^(]*\))/gi)` –  Jul 04 '16 at 17:10
  • @LUH3417 thank you for comment (: you mean because of the `-`? [It can be modified to needs](https://regex101.com/r/eJ5nE8/2). – bobble bubble Jul 05 '16 at 10:15