2

I'm parsing a file containing statements line by line. I want to:

  1. Identify all lines containing assignments.
  2. Replace identifiers of certain types (Input and Output).

A line is an assignment if it has one of the following two forms:

DataType Identifier = ...
Identifier = ...

The data type must be one of: "R", "L", "H", "X", "I". The data type is optional. Spaces are allowed in any position around the DataType and the Identifier. Example of lines containing statements:

L Input = ...
DigitalOutput = ...
  R Output= ...
H AnalogInput=...
  X Output   = ...

Expected result after parsing the statements above would be:

L Deprecated = ...
DigitalOutput = ...
  R Deprecated= ...
H AnalogInput=...
  X Deprecated   = ...

The file also contains other statements than assignments so its important to identify lines with assignments and only replace identifiers in that case. I've tried to use a regular expression with positive lookbehind and positive lookahead:

public void ReplaceIdentifiers(string line)
{
  List<string> validDataTypes = new List<string>{"R", "L", "H", "X", "I"};
  List<string> identifiersToReplace = new List<string>{"Input", "Output"};
  string = ...
  Regex regEx = new Regex(MyRegEx);
  regEx.Replace(line, "Deprecated");
}

Where MyRegex is on the form (pseudo code):

$@"(?<=...){Any of the two identifiers to replace}(?=...)"

The lookbehind:

Start of string OR 
Zero or more spaces, Any of the valid data types, Zero or more spaces OR
Zero or more spaces

The lookahead:

Zero or more spaces, =

I haven't managed to get the regular expression right. How do I write the regular expression?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Andy Rehn
  • 131
  • 6
  • 1
    This is not a question. This is business requirement. You should research, try to do something then if you have any particular code problem, you can ask. But where is the code? I don't see any. – jazb Nov 06 '19 at 07:46

2 Answers2

2

Since .NET regex supports non-fixed length Lookbehind, you may use the following pattern:

(?<=^\s*(?:[RLHXI]\s+)?)(?:Input|Output)(?=\s*=)

And replace with Deprecated.

Regex demo.

C# example:

string input = "L Input = ...\n" +
               "DigitalOutput = ...\n" + 
               "  R Output= ...\n" + 
               "H AnalogInput=...\n" + 
               "  X Output   = ...\n" + 
               "IOutput = ...\n" + 
               "Output = ...";

Regex regEx = new Regex(@"(?<=^\s*(?:[RLHXI]\s+)?)(?:Input|Output)(?=\s*=)", 
                        RegexOptions.Multiline);
string output = regEx.Replace(input, "Deprecated");
Console.WriteLine(output);

Output:

L Deprecated = ...
DigitalOutput = ...
  R Deprecated= ...
H AnalogInput=...
  X Deprecated   = ...
IOutput = ...
Deprecated = ...

Try it online.

  • Thanks! I will will try your solution! I use this site https://regex101.com/ and it doesn't allow quantifiers inside lookbehind, which was one of my problems... Do I need ?: before Input|Output and why? – Andy Rehn Nov 06 '19 at 08:02
  • @AndyRehn It's called a non-capturing group. You should only use a capturing group (i.e., without `?:`) when you need to get the value of what's inside the group separated from the full match. You may check [this post](https://stackoverflow.com/q/3512471/8967612) for more info. – 41686d6564 stands w. Palestine Nov 06 '19 at 08:17
  • 1
    I just discovered that the expression doesn't match for example: Output = ... How do I make [RLHXI] optional? – Andy Rehn Nov 06 '19 at 09:35
  • I found the answer: (?<=^\s*[RLHXI]?\s*)(?:Input|Output)(?=\s*=) When I replaced \s+ with \s* after the character class the quantifier ? worked... – Andy Rehn Nov 06 '19 at 09:44
  • 1
    @AndyRehn You're right; I didn't take into account the case where the DataType is absent. Your pattern, however, has one downside. It will match things like `IOutput = ...`. Instead of making each of `[RLHXI]` and the whitespace following it optional separately, you should make them both optional _as one unit_ (using a non-capturing group. That would be something like `(?<=^\s*(?:[RLHXI]\s+)?)(?:Input|Output)(?=\s*=)`. I've updated the answer. – 41686d6564 stands w. Palestine Nov 06 '19 at 10:31
1

For the particular case shown, your regex can be:

^(\s*[RLHXI]\s+)(?:Output|Input)(\s*=)

replace with $1Deprecated$2, with multiline option.

If both the type names and identifiers to replace are not available at compile time, you can use string.format with this format:

^(\s*(?:{0})\s+)(?:{1})(\s*=)

The arguments you pass to it will be the lists of strings, joined with |, using string.Join:

string regex = string.Format(
    @"^(\s*(?:{0})\s+)(?:{1})(\s*=)",
    string.Join("|", validDataTypes), // you should probably escape these beforehand
    string.Join("|", identifiersToReplace)
    );
Sweeper
  • 213,210
  • 22
  • 193
  • 313