1

I'm trying to replace pipe delimited character inside quotes with a space. The issue is I get to many false positives because some strings are null. I only want to replace the pipe if there is text between the quotes. The regex pattern I'm using is from another stackoverflow post as my regex skills are lacking.

data sample:

"Hello"|"Green | Blue"|123.45|""|""|""|5|45

code i'm using:

internal class Program
{
    public static void Main()
    {
        string pattern = @"(?: (?<= "")|\G(?!^))(\s*[^"" |\s]+(?:\s +[^ 
        ""|\s]+)*)\s*\|\s*(?=[^""] * "")";
        string substitution = @"\1 \2";
        string input = @"""20190430|""Test  Text""|""""|""""|""Manual""|""""|""Machine""|""""|""""|10.00|""""|0.00|||0.00||5600.00||||""A+""|""""|40.00||""""|""Vision Service |Troubleshoot""|57|""Y""|838|""Yellow Maroon""|850||""FL""||||0.00|||||||||||""""||""""||""""|||""""||||||""""||""""|""""||""""|""""||||||""""|""""|""""||||||||1||""";
        RegexOptions options = RegexOptions.Multiline;
        Regex regex = new Regex(pattern, options);
        string result = regex.Replace(input, substitution);
        Console.WriteLine("Result:" + result);
        Console.ReadKey();
    }
}

It replaces the 'Blue Green' pipe just fine. But it also replaces the pipes between quotes later which breaks the file as column get removed.

Updated the code with an actual sample of my file I'm processing. The regex finds it but doesn't replace the pipe. Missing something.

Jay Wehner
  • 21
  • 5
  • It is same solution as replacing commas outside double quotes. See : https://stackoverflow.com/questions/3147836/c-sharp-regex-split-commas-outside-quotes – jdweng Jul 02 '19 at 15:03
  • Also to be clear the final output should look like this: "Hello"|"Green Blue"|123.45|""|""|""|5|45 – Jay Wehner Jul 02 '19 at 22:04

2 Answers2

2

If there should be text between the double quotes and the text should be on both sides of the pipe, you might use:

(?<=")(\s*[^"\s|]+)\s*\|\s*([^\s"|]+\s*)(?=")

In the replacement use $1 $2

Explanation

  • (?<=") Postive lookbehind, assert what is on the left is "
  • (\s*[^"\s|]+) Capture in group 1 matching 0+ times a whitespace char, 1+ times not ", | or a whitespace char
  • \s*\|\s* Match a | between 0+ times a whitespace char
  • ([^\s"|]+\s*) Capture in group 2 matching 1+ times not ", | or a whitespace char and match 0+ times a whitespace char
  • (?=") Positive lookahead, assert what is on the right is "

.NET Regex demo

enter image description here

Edit

If you want to replace multiple pipes with a space between the double quotes you could make use of the \G anchor to assert the position at the end of previous match.

In the replacement use the first capturing group followed by a space $1

(?:(?<=")|\G(?!^))(\s*[^"|\s]+(?:\s+[^"|\s]+)*)\s*\|\s*(?=[^"]*")

Explanation

  • (?: Non capturing group
    • (?<=") Assert what is on the left is "
    • | Or
    • \G(?!^) Assert position at the end of the previous match
  • ) Close non capturing group
  • ( Capure group 1
    • \s*[^"|\s]+ Match 0+ times a whitespace char, followed by 1+ times not a | or whitespace char
    • (?:\s+[^"|\s]+)* Repeat 0+ times matching 1+ whitespace chars followed by 1+ times not a | or whitespace char
  • ) Close capturing group 1
  • \s*\|\s* Match a | between 0+ times a whitespace char
  • (?=[^"]*") Assert what is on the right is a "

See another .NET regex demo

enter image description here

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

My guess is that, we might also want to keep only one space in our text, and this expression,

"([^"]+?)\s+\|\s+([^"]+?)"

with a replacement of $1 $2 might work.

Demo

Example

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"""([^""]+?)\s+\|\s+([^""]+?)""";
        string substitution = @"\1 \2";
        string input = @"""Hello""|""Green | Blue""|123.45|""""|""""|""""|5|45";
        RegexOptions options = RegexOptions.Multiline;
        
        Regex regex = new Regex(pattern, options);
        string result = regex.Replace(input, substitution);
    }
}
Community
  • 1
  • 1
Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    After further testing this is not working also. The text can be anything. not just whole words. So this is an acceptable field: "x390|Gigabyte|Awesome"|$249.00|"RGB"|""|""|""|0 – Jay Wehner Jul 02 '19 at 21:59