4

I am trying many ways to do that with Regex but I don't have strong skills with regex.

Well my situation is, I have this string What is this and this is cool, in this case I need split is but I won't split What is, I want only split is between this and cool.

I tried with this regex (?!What)....(\sis\s) but this will return this is and I need only the second is.

Thanks advanced

ViTUu
  • 1,204
  • 11
  • 21
  • What about `What is this and this thing is cool` ? – revo Mar 03 '18 at 22:38
  • @revo, I am working with NLP and some people write crazy sentences, and sentences like that broke the flow :( – ViTUu Mar 03 '18 at 22:41
  • I mean what if a `thing` is between `this` and `cool`? Should it split at `is` between them? – revo Mar 03 '18 at 22:42
  • If I apply split on 'is' it will return `What`, `this and this`, `cool` and I expected return `What is this and this` `cool `, I need the first thing after is and the first thing after this is wrong :( – ViTUu Mar 03 '18 at 22:44
  • Please read my sample input carefully *What is this and this **thing** is cool* and say whether or not it should split at second `is`. – revo Mar 03 '18 at 22:45
  • what is the general case you try to handle ? – Pierre Emmanuel Lallemant Mar 03 '18 at 22:45
  • I am trying handle situation when I have some words to split, but I won't split when match specify word before like `What` in this case, I just only want understand how to do that because I want to create others patterns – ViTUu Mar 03 '18 at 22:47
  • I don't know if you it's possible to have a pretty regex, but you can split your string on `" "` to get all words, then in a loop search for the 'is' occurences, and handle your custom rules. – Pierre Emmanuel Lallemant Mar 03 '18 at 22:59
  • I will do that.. but I have hope :P thanks for your time – ViTUu Mar 03 '18 at 23:04
  • 1
    Check if this is what you are trying to do http://jsbin.com/koqozozebi/edit?js,console – revo Mar 03 '18 at 23:46
  • You need to construct groups with positive or negative LookBehind. [see answers here](https://stackoverflow.com/questions/2973436/regex-lookahead-lookbehind-and-atomic-groups) for example `(?<!What)\sis` will return first occurence of " is " not preceded by "what" `(?<=this)\sis\s` will return first occurence of " is " preceded by "this" – scraaappy Mar 03 '18 at 23:47
  • 2
    Some crazy workaround, see [this fiddle](https://jsfiddle.net/9m90sr7k/6/). – Wiktor Stribiżew Mar 03 '18 at 23:47
  • Thanks guys, scraaappy I think ?< isn't work on javascript. revo very nice regex, I will study it and test some cases in my server, Wiktor Stribiżew I will read your workaround maybe it will resolve to me.. – ViTUu Mar 04 '18 at 00:41
  • Did below answer help? or should I improve or remove it? – revo Mar 05 '18 at 13:28

1 Answers1

1

TL; DR

A one-liner regex solution:

(?:\bis\b)?((?:(?!\bis\b)(?:What\s+is\b)?.?)+)

Regex live demo

JS code:

var re = /(?:\bis\b)?((?:(?!\bis\b)(?:What\s+is\b)?.?)+)/;
console.log(
  "What is this and is What is this is cool What is this is"
  .split(re)
  .filter(Boolean)
);

Whole philosophy

This regex tries to match a is word as soon as it steps into one otherwise it will continue matching a What is occurrence or other characters as long as they are not is. It matches and captures anything other than is.

The trick here is using a tempered token to see if next occurrence is a is or not. If not, try to match a What is or a single character. This process continues up to meeting a is.

  • (?:\bis\b)? Try to match a word is
  • ( Start of capturing group #1
    • (?: Start of non-capturing group
      • (?!\bis\b) Look if next word is is
      • (?:What\s*is\b)?.? If not try to match What is or a single character or nothing
    • )+ Repeat as many as possible
  • ) End of capturing group #1

split() method considers matched parts in capturing group to output array, hence matching is in regex and capturing anything else.

To avoid splitting on different words other than What you only need to add each word in an alternation within a group:

(?:\bis\b)?((?:(?!\bis\b)(?:(?:What|How|Who)\s+is\b)?.?)+)

You may need to set i flag in order to capture what is too if you want or splitting happens.

Community
  • 1
  • 1
revo
  • 47,783
  • 14
  • 74
  • 117