-1

I have a string with a pattern similar to the following one:

TITLE.wordX. aaa.: AAAAAAA;AAAAA. BBBB: bbbb.

I want to split this strings by ". " for getting something like: ['TITLE','wordX. aaa.: AAAAAAA;AAAAA', 'BBBB: bbbb']

The problem is that the string 'wordX. aaa.: AAAAAAA;AAAAA' contains a dot itself, so by spliting the string as I previously said, the real output would be:['TITLE','wordX','aaa.: AAAAAAA;AAAAA', 'BBBB: bbbb']

Therefore, I want a regex that allows me to tell the split to find "every dot which is not followed by wordX". Looking for this on the internet, I found that some suggested using the negative lookahead for these cases, like ^((?!wordX).)*$. Nevertheless, this has not worked for me apparently (maybe I am not using it the right way).

Due to all this, I would like to know how to build a regex for matching every dot that does not have the wordX inmediately before and that is followed by a space.

pepito
  • 433
  • 1
  • 3
  • 15
  • 1
    The first and second part of your question do not seem to correspond. Do you want to find a dot only when it is followed by a phrase, or only if it is not followed by a certain phrase? If it's the former, a simple positive lookahead should suffice, e.g. \.(?=wordX). – oriberu Mar 14 '20 at 16:11
  • 1
    If you were to split the string on the matching dots would you not get `["TITLE.wordX", " aaa", ": AAAAAAA;AAAAA", " BBBB: bbbb"]`? – Cary Swoveland Mar 14 '20 at 16:13
  • @CarySwoveland maybe it is not clear in the question, but I am splitting by "dot+space" – pepito Mar 14 '20 at 16:16
  • @oriberu I want to do the second thing, find the dots that don't have the wordX inmediately before – pepito Mar 14 '20 at 16:17
  • 1
    Your last comment directly contradicts the last sentence of your question and that sentence says nothing about the need for the dot to be followed by a space. Please edit. – Cary Swoveland Mar 14 '20 at 16:22
  • I am sorry, I have just edited it – pepito Mar 14 '20 at 16:27

2 Answers2

0

I'm still not completely clear which scenario you want exactly, so here are a couple of options with what is matched in each case.

Positive lookahead:

\.(?=wordX)

TITLE.wordX. aaa.: AAAAAAA;AAAAA. BBBB: bbbb.
     ^

Negative lookahead:

\.(?!wordX)

TITLE.wordX. aaa.: AAAAAAA;AAAAA. BBBB: bbbb.
           ^    ^               ^           ^

Positive lookbehind:

(?<=wordX)\.

TITLE.wordX. aaa.: AAAAAAA;AAAAA. BBBB: bbbb.
           ^

Negative lookbehind:

(?<!wordX)\.

TITLE.wordX. aaa.: AAAAAAA;AAAAA. BBBB: bbbb.
     ^          ^               ^           ^

You should in any case reword your question. Cheers.

oriberu
  • 1,186
  • 9
  • 6
  • thank you for your answer. I did reword my question. The alternative I want is the last one you talk about, but, when trying it at https://www.regextester.com/ it seems to also mark the dot after wordX – pepito Mar 14 '20 at 16:29
  • Oh, I'm sorry, I have tried it at another website and it does work (ithe problem was the first page I talked about was for javascript). Thank you! – pepito Mar 14 '20 at 16:34
  • Yeah, JavaScript and lookbehinds have quite the non-history. But it's in the 2018 standard and we can hope for widespread implementation. :) – oriberu Mar 14 '20 at 16:42
0

Maybe...

^(.*?)\.((.*?)(?<!wordX)\.(.*?))\. (.*)\.

Given:

TITLE.wordX. aaa.: AAAAAAA;AAAAA. BBBB: bbbb.

With groups:

\1 \2 \5

Demo: https://regex101.com/r/hPY8JX/2

Oops, update for JS?

^(.*?)\.(wordX\..*)\. (.*)\.

https://regex101.com/r/V49Fiw/1

MDR
  • 2,610
  • 1
  • 8
  • 18