0

I'd like to know if there's a way to avoid the foreach loop in the following code:

List<string> lines1 = new List<string>();
List<string> lines2 = new List<string>();
lines1.AddRange(File.ReadAllLines("in.txt"));
foreach(string s in lines1)
    lines2.Add(Regex.Replace(s,"bim(.*)","bom$1");

Note that the loop also requires to have two lists during processing. My goal is to apply a regex to each string inside a list in situ.

Dmitry Grigoryev
  • 3,156
  • 1
  • 25
  • 53
  • 1
    You can change the current value of an iterator in a `foreach` iteration, so use a regular `for` loop instead. – Matt Burland Jul 29 '15 at 15:28
  • Do you mean you want to update the lines in `lines1` while iterating over it? – CodeCaster Jul 29 '15 at 15:28
  • @CodeCaster I'd like to avoid iterating at all, if possible. I want to write something like `lines1.Transform(s => Regex.Replace(s, "bim(.*)", "bom$1");` – Dmitry Grigoryev Jul 29 '15 at 15:30
  • @DmitryGrigoryev: It is impossible to do something to *every item in a collection* without actually iterating through that collection. It cannot be better than `O(n)` – Matt Burland Jul 29 '15 at 15:32
  • @MattBurland, Fair remark, thanks. I'd prefer not to write any loop at all if possible. – Dmitry Grigoryev Jul 29 '15 at 15:32
  • @MattBurland, I know, but it's not about performance. I just want the framework to iterate through the collection for me. I mean, I use `lines.RemoveAll(s => s.Contains("bim"))`, why can't a regex be applied in the same way? – Dmitry Grigoryev Jul 29 '15 at 15:37
  • Because it isn't. But you could easily write an extension method of your own to do that. – Matt Burland Jul 29 '15 at 15:38

3 Answers3

6

You say you don't want to iterate. Then don't create a collection to begin with, but read the entire file in one string:

string input = File.ReadAllText("in.txt");
string output = Regex.Replace(input, "bim(.*)", "bom$1");

Then if you want to get the "lines" in the input, split the output as explained in Easiest way to split a string on newlines in .NET?:

string[] outputLines = input.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
Community
  • 1
  • 1
CodeCaster
  • 147,647
  • 23
  • 218
  • 272
  • It *might* be worth noting that while you are still iterating over the string implicitly, of course. – Matt Burland Jul 29 '15 at 15:34
  • @Matt as in "the regex internally iterates over the characters in a string", sure, but OP seems to not want to use `for` or `foreach`. – CodeCaster Jul 29 '15 at 15:35
  • That's a nice solution, thanks! I'm still interested if there's a way to apply a regex to a string collection though. – Dmitry Grigoryev Jul 29 '15 at 15:40
  • 2
    While this does eliminate explicit iteration, there are a few subtle differences between `ReadAllLines` and `ReadAllText`/`Split`: 1) `ReadAllLines` accounts for all 3 `NewLine` possibiliites (`\r`, `\n`, and `\r\n`). 2. There is a change (probably slight) that the replace pattern could cross a line break e.g. `bim__\n__`) and get replaced when it would not when looking at individual lines.. – D Stanley Jul 29 '15 at 15:41
  • 1
    @DmitryGrigoryev Not without iterating either explicitly with `for` or implicitly with Linq. – D Stanley Jul 29 '15 at 15:41
4

You can't do it with foreach because you can't modify the collection whilst iterating over it, but you can use for:

List<string> lines = new List<string>(File.ReadAllLines("in.txt"));
for(int i = 0; i < lines.Count; i++)
    lines[i] = Regex.Replace(lines[i],"bim(.*)","bom$1");

Or a one-liner:

List<string> lines = File.ReadLines("in.txt")
                         .Select(s => Regex.Replace(s ,"bim(.*)","bom$1"))
                         .ToList();

Note that ReadLines does not read the entire file into memory, so the projection will transform the line as it is read from the file (meaning that a second collection is not created).

D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • `lines1.Select(s => Regex.Replace(s, "bim(.*)", "bom$1")` is exactly what I was looking for! Thanks! It's not exactly *in situ* but I'll take it! Sidenote: the name `Select` was really hard to guess. – Dmitry Grigoryev Jul 29 '15 at 15:45
  • @Dmitry using Select and ToList is essentially the same as using foreach and adding the result to a new list... Why do you prefer this over a conventional loop? – CodeCaster Jul 29 '15 at 16:06
  • @DmitryGrigoryev I have updated the answer to use `ReadLines` instead of `ReadAllLines` which will transform the data in-place without creating another collection. – D Stanley Jul 29 '15 at 16:11
  • 1
    @DmitryGrigoryev: The `Select` is from the SQL `Select`. It's pretty obvious when you are dealing with a dataset that you pulled from a database, but is perhaps less obvious when you aren't thinking in that frame of mind. – Matt Burland Jul 29 '15 at 17:43
3

Just use a regular for loop and you avoid the need for an extra list

for (var i=0; i<lines1.Count; i++)
{
    lines1[i] = Regex.Replace(lines1[i],"bim(.*)","bom$1");
}

Note, however, that you are still creating a new string for every string in lines1 because string are immutable.

Or, if you want, you can just write an extension method, something like this should work:

public static class Extensions
{
    public static IEnumerable<string> RegexReplace (this IEnumerable<string> strings, Regex regex, string replacement)
    {
        foreach (var s in strings)
        {
            yield return regex.Replace(s, replacement);
        }
    }
}

And you could call it like this:

var lines1 = File.ReadLines("in.txt").RegexReplace("bim(.*)","bom$1");

This extension would allow you to apply a regex to every string in a collection and since it's using deferred execution, it won't actually do anything until you iterate it. So, for example, if you only needed to check the first line (perhaps to decide if the rest of the file should be processed), you'd be able to shortcut out without looking at the rest of the lines. In a case like that, we can be O(1) for best case.

Matt Burland
  • 44,552
  • 18
  • 99
  • 171