4

I would like to use the ((?!(SEPARATOR)).)* regex pattern for splitting a string.

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        var separator = "__";
        var pattern = String.Format("((?!{0}).)*", separator);
        var regex = new Regex(pattern);

        foreach (var item in regex.Matches("first__second"))
            Console.WriteLine(item);        
    }
}

It works fine when a SEPARATOR is a single character, but when it is longer then 1 character I get an unexpected result. In the code above the second matched string is "_second" instead of "second". How shall I modify my pattern to skip the whole unmatched separator?

My real problem is to split lines where I should skip line separators inside quotes. My line separator is not a predefined value and it can be for example "\r\n".

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • No, just split with that pattern. If it is a regex pattern, use `Regex.Split`, if it is a literal fixed string like `__`, just use `string.Split`. You won't be able to achieve what you want by *matching* in a .NET regex. In PCRE, you would use `(*SKIP)(*FAIL)` verbs, but they are not supported in .NET. – Wiktor Stribiżew Jul 13 '17 at 10:23
  • 1
    Why not use `string.Split` for this ? Ex. `"first__second".Split(new[] { "__" }, StringSplitOptions.None);` – ZarX Jul 13 '17 at 10:24
  • I would like to use regex because my pattern is more complicated. String.Split is not enough for my purpose. – user1701074 Jul 13 '17 at 10:29
  • And what is your *real* problem then? Right now, your question is a dupe of [string.split - by multiple character delimiter](https://stackoverflow.com/questions/1254577/string-split-by-multiple-character-delimiter). – Wiktor Stribiżew Jul 13 '17 at 10:29
  • My real problem is to split lines where I should skip line separators inside quotes. My line separator is not a predefined value and it can be for example "\r\n". – user1701074 Jul 13 '17 at 10:44
  • So, the real scenario differs from what you posted a lot. Try `Regex.Matches(s, "(?:\"[^\"]*\"|[^\r\n\"])+")` – Wiktor Stribiżew Jul 13 '17 at 11:11

2 Answers2

0

You can do something like this:

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "plum--pear";
      string pattern = "-";            // Split on hyphens

      string[] substrings = Regex.Split(input, pattern);
      foreach (string match in substrings)
      {
         Console.WriteLine("'{0}'", match);
      }
   }
}


// The method displays the following output:
//    'plum'
//    ''
//    'pear'  
Rifat Bin Reza
  • 2,601
  • 2
  • 14
  • 29
0

The .NET regex does not does not support matching a piece of text other than a specific multicharacter string. In PCRE, you would use (*SKIP)(*FAIL) verbs, but they are not supported in the native .NET regex library. Surely, you might want to use PCRE.NET, but .NET regex can usually handle those scenarios well with Regex.Split

If you need to, say, match all but [anything here], you could use

var res = Regex.Split(s, @"\[[^][]*]").Where(m => !string.IsNullOrEmpty(m));

If the separator is a simple literal fixed string like __, just use String.Split.

As for your real problem, it seems all you need is

var res = Regex.Matches(s, "(?:\"[^\"]*\"|[^\r\n\"])+")
    .Cast<Match>()
    .Select(m => m.Value)
    .ToList();

See the regex demo

It matches 1+ (due to the final +) occurrences of ", 0+ chars other than " and then " (the "[^"]*" branch) or (|) any char but CR, LF or/and " (see [^\r\n"]).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563