2

I have this type of text:

string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5

I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?

I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.

I am doing a global replacement in Notepad++. I just need regex, code will not work.

Thank you in advance.

Roman

Community
  • 1
  • 1
Roman Mironov
  • 21
  • 1
  • 3

4 Answers4

0

This regex will do the job:

^string1_(?:.*(bit))+.*_string2$
  • ^ means the start of the text (or line if you use the m option like so: /<regex>/m )
  • $ means the end of the text
  • . means any character
  • * means the previous character/expression is repeated 0 or more times
  • (?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
Carlos
  • 4,949
  • 2
  • 20
  • 37
0

You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".

If you provide us with the language you want to use, we could give more specific solutions.

edit: As you added that you're trying a replacement in Notepad++ I propose the following: Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.

Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.

Fabian
  • 318
  • 1
  • 10
0

If I understand correctly here a code to do what you want

            var intput = new List<string>
            {
                "string1_dog_bit_johny_bit_string2",
                "string1_cat_bit_johny_bit_string2",
                "string1_crocodile_bit_johny_bit_string2",
                "string3_crocodile_bit_johny_bit_string4",
                "string4_crocodile_bit_johny_bit_string5"
            };
        Regex regex = new Regex(@"(?<bitGroup>bit)");
        var allMatches = new List<string>();
        foreach (var str in intput)
        {
            if (str.StartsWith("string1") && str.EndsWith("string2"))
            {
                var matchCollection = regex.Matches(str);
                allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
            }
        }


        Console.WriteLine("All matches {0}", allMatches.Count);
Sergey K
  • 4,071
  • 2
  • 23
  • 34
0

You can use the regex:

(?:string1|\G)(?:(?!string2).)*?\Kbit

regex101 demo. Tried it on notepad++ as well and it's working.

There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

Jerry
  • 70,495
  • 13
  • 100
  • 144