0

I'm trying to match and replace in one go all matches of a specific name in the text (few sentences), but the issue is that this name can be a part of another name. Example: I need to replace a name 'Item 1', but I can have in sentence also Item 11 or My Item 1 or Item 1 Test, which are known names and shouldn't be touched. The list of known untouchable names is built dynamically.

Technically I want to express 'match all Item 1 but if it is not part of [Item 11 | My Item 1 | Item 1 Test]'

Example sentence: Only Item 11 left due to the promotion on Item 1. And I'd like to replace Item 1 with 'something' Expected output: Only Item 11 left due to the promotion on Something. Is it possible to achieve with RegEx?

  • Does each entry (`Item 1`, `Item 11`,...) occur in a separate line? – Bee Feb 06 '20 at 18:54
  • @MonkeyZeus, no it doesn't. K cannot cover my case with word boundaries. Simply this scenario will not work: "My Item 1 test" is a known untouchable name. and I need to replace "Item 1" only when it is NOT part of other known untouchable names – John Constantine Feb 06 '20 at 20:22
  • I don't get it, so you only want precise matches? If so then use anchors `^Item 1$`. If you are using a programming language then a simple string comparison would be exponentially faster. `if( $item === 'Item 1' ){}else{}` – MonkeyZeus Feb 06 '20 at 20:25
  • @Bee, no, they might be just part of the same sentence – John Constantine Feb 06 '20 at 20:25
  • If `Item 12` was used in the sentence then would it be in the dynamic list as well or is this a situation for word boundaries? Where does the dynamic list come from? – MonkeyZeus Feb 06 '20 at 20:31
  • @MonkeyZeus, I cannot. I need to match and replace part of sentence. Also I might have few occurrences of same name in sentence. I have added example in initial question – John Constantine Feb 06 '20 at 20:32
  • @MonkeyZeus, yes, it will be known untouchable name in this case. Those dictionary of names are coming from one part of the system. And one of the name is changed. So untouchable names are simply subset of those names which contains current renamed name. And in another part I have user defined texts, which I want to update with the new name where it was used, but not update accidentally names which are simply containing current name – John Constantine Feb 06 '20 at 20:33
  • In that case your list of known untouchables would be infinity in length because you need to account for `Item 13`, `Item 111`, `Item 11111111111111111111111111111119`, etc... – MonkeyZeus Feb 06 '20 at 20:35
  • 1
    It is easy with a callback, it is easy with a PCRE pattern. A PCRE pattern will look like `(?:Untouchable1|Untouchable2|Untouchable3|etc)(*SKIP)(*F)|other|valid|ones` – Wiktor Stribiżew Feb 06 '20 at 22:00
  • @WiktorStribiżew, PCRE is exactly what I need. Unfortuntaly, is not supported by C#. I found one nuget package in order to use it, will take a closer look to understand how to achieve it in a short and clear way. – John Constantine Feb 08 '20 at 06:50
  • If you are working in C#, no need in PCRE. You may use the pattern as above with `Regex.Replace`. Just remove skip-fail and set a capturing group to know which word you matched. Try `Regex.Replace(text, @"(Untouchable1|Untouchable2|Untouchable3|etc)|other|valid|ones", m => m.Groups[1].Success ? m.Groups[1].Value : "some replacement")`. Removing is even easier - `Regex.Replace(text, @"(Untouchable1|Untouchable2|Untouchable3|etc)|other|valid|ones", "$1")` – Wiktor Stribiżew Feb 08 '20 at 12:00
  • @WiktorStribiżew, first example yes, requires some update but is working. So I end up with both similar options: wrapping valid as second group and then `var result = Regex.Replace(testString, pattern, m => m.Groups[1].Success ? replaceWith : m.Groups[0].Value);` or pattern as yours without second group and then it will be next replacement: `var result = Regex.Replace(testString, pattern, m => m.Groups[0].Success && m.Groups[0].Value == nameToReplace ? replaceWith : m.Groups[0].Value);` // Thank you. Can you please add both options - with PCRE and without as an answer? – John Constantine Feb 10 '20 at 07:40

2 Answers2

1

With PCRE, you could rely on a SKIP-FAIL technique:

(?:Untouchable1|Untouchable2|Untouchable3|other words to keep)(*SKIP)(*F)|other|words|to|match-and-replace

Since you are using .NET, namely C#, and have access code, you may use a pattern where you capture the words you need to replace and just match the words you need to keep, and then use a match evaluator to examine the Group 1 value: if Group 1 match is success, replace, else, keep the match.

var pattern = @"Untouchable1|Untouchable2|Untouchable3|other words to keep|(other|words|to|match-and-replace)";
var result = Regex.Replace(testString, pattern, m => 
    m.Groups[1].Success ? replaceWith : m.Value);
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

For your specific scenario you could use:

(?<!My )\bItem 1\b(?! Test)

https://regex101.com/r/iBC6Bf/1/

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
  • 1
    For specific scenario - yes. For my dynamic case - no. As I described above: I was thinking about pre and postfixes, but is it possible to combine several options. Consider this example, I have let's say 3 known names or products - Item 1, New Item 1, Old Item 1 Ex. And I want to replace 'Item 1' with 'Item 1 Super' in sentence like 'Here we have Item 1 which is a replacement for Old Item 1 Ex and is different from New Item 1'. – John Constantine Feb 06 '20 at 20:49
  • 1
    @JohnConstantine You would have to sort your known untouchables by length in descending order, replace their existence with a unique placeholder, perform the `Item 1` -> `Item 1 Super` replacement, and finally bring back the untouchables by performing a reversed replace with their placeholders. You need programming to do what you envision; pure regex will be impossible. – MonkeyZeus Feb 06 '20 at 21:04