0

I have a text and I need to match all text parts except given words with regexp

For example if text is ' Something went wrong and I could not do anything ' and given words are 'and' and 'not' then the result must be ['Something went wrong', 'I could', 'do anything']

Please don't advise me to use string.split() or string.replace() and etc. I know a several ways how I can do this with build-in methods. I'm wonder if there a regex which can do this, when I will execute text.match(/regexp/g)

Please note that the regular expression must work at least in Chrome, Firefox and Safari versions not lower than the current one by 3! At the moment of asking this question the actual versions are 100.0, 98.0.2 and 15.3 respectively. For example you can not use lookbehind feature in Safari

Please, before answering my question, go to https://regexr.com/ and check your answer!. Your regular expression should highlight all parts of a sentence, including spaces between words of need parts and except empty spaces around need parts, except for the given words

Before asking this question I tried to do my own search but this links didn't help me. I also tried non accepted answers:

Match everything except for specified strings

Regex: match everything but a specific pattern

Regex to match all words except a given list

Regex to match all words except a given list (2)

Need to find a regular expression for any word except word1 or word2

Matching all words except one

Javascript match eveything except given words

InSync
  • 4,851
  • 4
  • 8
  • 30
EzioMercer
  • 1,502
  • 2
  • 7
  • 23
  • Can you share the actual regexes you've applied that haven't worked? This just looks like you linked to similar questions without showing how you tried to adapt them to your problem. – Dan Csharpster Apr 08 '22 at 13:43
  • @DanCsharpster You hope that I remember all variants which I tried? :) I tried to a lot of different symbol changing but couldn't adapt them to my problem. If you sure that answer already given in the list of given links, then please share a link to correct answer, but before please check answer on https://regexr.com/ as I mentioned before – EzioMercer Apr 08 '22 at 13:56
  • I'm saying those links are different approaches and how they were applied could make a difference. One of those might work but there may be an issue with how it was applied. Sharing actual code would make it easier for people to help you out. Links are useful for reference but its asking a lot for contributors to look at each of those links and then guess how you applied those techniques to your problem. – Dan Csharpster Apr 08 '22 at 15:17
  • @DanCsharpster the best regexp was `(?!(and|not))\b\w+` but it select words separately but I want the whole parts – EzioMercer Apr 08 '22 at 21:54
  • 2
    Just in case you drop that terrible Safari browser: `/(?<=^|and|not).*?(?=and|not|$)/g`. – Poul Bak Apr 08 '22 at 23:58
  • @PoulBak Unfortunately I can not do this as I did it with IE :) But thank you! – EzioMercer Apr 09 '22 at 00:36

2 Answers2

5

It's possible with only using match and lookaheads in javascript.

/\b(?=\w)(?!(?:and|not)\b).*?(?=\s+(?:and|not)\b|\s*$)/gi

Test on RegExr here

Basically match the start of a word that's not a restricted word
\b(?=\w)(?!(?:and|not)\b)
Then a lazy match till the next whitespaces and restricted word, or the end of the line without including last whitespaces.
.*?(?=\s+(?:and|not)\b|\s*$)

Test Snippet :

const re = /\b(?=\w)(?!(?:and|not)\b).*?(?=\s+(?:and|not)\b|\s*$)/gi

let str = `   Something went wrong    and    I could   not   do anything   `;
let arr = str.match(re);
console.log(arr);
LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • 1
    This is beautiful. It trims the extra whitespace around the matches and doesn't capture an empty trailing group either. Just one minor thing, you don't need the `m` flag. – Besworks Apr 09 '22 at 09:43
  • @Besworks Thx, the `m` can indeed be removed if it's only a string with 1 line. But it's debatable whether it's useful for multiline strings with this regex. – LukStorms Apr 09 '22 at 11:01
  • "Your regular expression should highlight all parts of a sentence, including spaces". – Poul Bak Apr 09 '22 at 11:51
  • @PoulBak The expected array doesn't contain the leading nor trailing spaces. – LukStorms Apr 09 '22 at 14:10
  • @PoulBak Sorry if this sentence confused you! I meant to highlight not words separately but with spaces between them. Because I found this regexp `(?!(and|not))\b\w+` and it select all words without space between them – EzioMercer Apr 09 '22 at 14:37
3

See Edit further down.

You can use this regex, which only use look ahead:

/(?!and|not)\b.*?(?=and|not|$)/g

Explanation:

(?!and|not) - negative look ahead for and or not

\b - match word boundary, to prevent matching nd and ot

.*? - match any char zero or more times, as few as possible

(?=and|not|$) - look ahead for and or not or end of text

If your text has multiple lines you can add the m flag (multiline). Alternatively you can replace dot (.) with [\s\S].

Edit:

I have changed it a little so spaces around the forbidden words are removed:

/(?!and|not)\b\w.*?(?= and| not|$)/g

I have added a \w character match to push the start of the match after the space and added spaces in the look ahead.

Edit2: (to handle multiple spaces around words):

You were very close! All you need is a \s* before the dollar sign and specified words:

/(?!and|not|\s)\b.*?(?=\s*(and|not|$))/g

Updated link: regexr.com

EzioMercer
  • 1,502
  • 2
  • 7
  • 23
Poul Bak
  • 10,450
  • 5
  • 32
  • 57
  • Thank you very much! Your regexp looks much shorter than in @LukStorms answer but your regexp select an extra space at the start and at the end of selected parts. If you can remove this extra spaces and keep your regexp as short, then it will be great – EzioMercer Apr 09 '22 at 14:49
  • I misunderstood the part about spaces, now surrounding spaces are removed from the matches. – Poul Bak Apr 10 '22 at 00:58
  • Thank you very much! Can you please help me again? I want to ignore all empty spaces around need parts. I tried to modify your regexp `/(?!and|not|\s)\b.*?(?=\s*and|\s*not|$)/g` and it ignores all empty spaces around except empty spaces after last part. What do I need to add/modify? – EzioMercer Apr 10 '22 at 04:39
  • You missed adding `\w` in the regex, that takes care of that, so my edit is what you want. – Poul Bak Apr 10 '22 at 13:45
  • If it doesn't work: Edit your question and include an example, that doesn't work. – Poul Bak Apr 10 '22 at 14:31
  • I updated my question. If I have extra empty spaces at the end of sentence then regex will include them too. Here is my [RegExp](https://regexr.com/6j9ud) which I get when tried to adopt your regexp – EzioMercer Apr 10 '22 at 15:50
  • Thank you very much! It was a stupid mistake :) Your regexp works perfectly! This regexp `(?!and|not|\s)\b.*?(?=\s*and|\s*not|\s*$)` also works with just a little difference: You search all parts which started with `\w` but instead of this I ignore leading empty spaces with `|\s` – EzioMercer Apr 12 '22 at 11:28
  • Just a little update :) You can write `\s*` only one time at the second part of regexp `/(?!and|not|\s)\b.*?(?=\s*(and|not|$))/g` – EzioMercer Apr 20 '22 at 14:43
  • 1
    That's absolutely correct and does look better. – Poul Bak Apr 20 '22 at 14:45