2

I'm trying to write a regex that removes all leading whitespace like this:

enter image description here

The following code does this, but also greedily removes multiple lines, like this:

enter image description here

How can I change the regex so that it removes preceding whitespace from each line, but leaves the multiple lines intact?

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace test_regex_double_line
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> resources = new List<string>();

            resources.Add("Jim Smith\n\t123 Main St.\n\t\tWherever, ST 99999\n\nFirst line of information.\n\nSecond line of information.");

            foreach (var resource in resources)
            {
                var fixedResource = Regex.Replace(resource, @"^\s+", m => "", RegexOptions.Multiline);
                Console.WriteLine($"{resource}\n--------------\n{fixedResource}\n===========================");
            }
        }
    }
}
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Edward Tanguay
  • 189,012
  • 314
  • 712
  • 1,047
  • Hmm, instead of `\s`, why don't you use `[\t ]`? (note the space character there...) Any particular reasoning for not doing that? –  May 31 '19 at 14:27
  • 3
    What are you trying to do exactly? The screenshot seems to have leading whitespace before the line `123 Main St.`. Have you tried `string.TrimStart()`? – the default. May 31 '19 at 14:28
  • Why not use `TrimStart()`? The regex won't run faster than a simple TrimStart. It won't save any memory either since `Replace` returns a new string anyway – Panagiotis Kanavos May 31 '19 at 14:30
  • `string result = Regex.Replace(source, @"^[\s-[\r\n]]+", "", RegexOptions.Multiline);` – Dmitry Bychenko May 31 '19 at 14:32
  • @PanagiotisKanavos TrimSmart() will only trim the beginning of the full string, but I need to trim the beginning of each line in a multiline string. – Edward Tanguay May 31 '19 at 14:38
  • 1
    @elgonzo that doesn't remove tabs. `[\p{Zs]\t]` would work. C# (and Unicode) doesn't seem to have a single ["horizontal whitespace" class](https://stackoverflow.com/questions/3469080/match-whitespace-but-not-newlines) the way Perl does. `[\p{Zs]\t]` seems to produce the same character range – Panagiotis Kanavos May 31 '19 at 15:03
  • @PanagiotisKanavos, dang! You are right. Serves me right for just looking at the documentation and not double-checking it with an online regexer. \p treats tabs as control characters and not as space separators... –  May 31 '19 at 15:06
  • @elgonzo [the tab](https://www.fileformat.info/info/unicode/char/0009/index.htm) belongs to the control (Cc) category. – Panagiotis Kanavos May 31 '19 at 15:13

1 Answers1

2

Let's try removing all whitespaces (\s) but \n and \r ones, i.e. [\s-[\r\n]]+ pattern

Code:

string resource = 
  "Jim Smith\n\t123 Main St.\n\t\tWherever, ST 99999\n\nFirst line of information.\n\nSecond line of information.";

string fixedResource = Regex.Replace(resource, @"^[\s-[\r\n]]+", "", RegexOptions.Multiline);

Console.Write(fixedResource);

Outcome:

Jim Smith
123 Main St.
Wherever, ST 99999

First line of information.

Second line of information.

Edit: If you want to process a collection (say, List<string>) it's reasonable to define Regex outside the loop (Linq) etc. for performance reasons (see Panagiotis Kanavos comment):

List<string> resources = new List<string>() {
  "Jim Smith\n\t123 Main St.\n\t\tWherever, ST 99999\n\nFirst line of information.\n\nSecond line of information.",
};

Regex regex = new Regex(@"^[\s-[\r\n]]+", RegexOptions.Multiline);

List<string> fixedResources = resources
  .Select(resource => regex.Replace(resource, ""))
  .ToList();
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215