3

We have a requirement to transform a string containing a date in dd/mm/yyyy format to ddmmyyyy format (In case you want to know why I am storing dates in a string, my software processes bulk transactions files, which is a line based textual file format used by a bank).

And I am currently doing this:

string oldFormat = "01/01/2014";
string newFormat = oldFormat.Replace("/", "");

Sure enough, this converts "01/01/2014" to "01012014". But my question is, does the replace happen in one step, or does it create an intermediate string (e.g.: "0101/2014" or "01/012014")?


Here's the reason why I am asking this:

I am processing transaction files ranging in size from few kilobytes to hundreds of megabytes. So far I have not had a performance/memory problem, because I am still testing with very small files. But when it comes to megabytes I am not sure if I will have problems with these additional strings. I suspect that would be the case because strings are immutable. With millions of records this additional memory consumption will build up considerably.

I am already using StringBuilders for output file creation. And I also know that the discarded strings will be garbage collected (at some point before the end of the time). I was wondering if there is a better, more efficient way of replacing all occurrences of a specific character/substring in a string, that does not additionally create an string.

Community
  • 1
  • 1
sampathsris
  • 21,564
  • 12
  • 71
  • 98
  • you should try using Regex.Replace, and compare performance. I once had to remove unnecessary NewLine characters from a file of size ~1MB, and regex made a lot of difference (measured in minutes...) Although I had to do conditional replace and some other text operations, so I recomend testing it in this exact case – Arie Oct 10 '14 at 12:06
  • 2
    I think it allocates only one string for one entire Replace. Not one string for each replace of an occurrence. – Selman Genç Oct 10 '14 at 12:08
  • `String ReplaceInternal` is method implemented externally. I don't think we can know what is going on under the hood. – Soner Gönül Oct 10 '14 at 12:12

4 Answers4

7

Sure enough, this converts "01/01/2014" to "01012014". But my question is, does the replace happen in one step, or does it create an intermediate string (e.g.: "0101/2014" or "01/012014")?

No, it doesn't create intermediate strings for each replacement. But it does create new string, because, as you already know, strings are immutable.

Why?

There is no reason to a create new string on each replacement - it's very simple to avoid it, and it will give huge performance boost.

If you are very interested, referencesource.microsoft.com and SSCLI2.0 source code will demonstrate this(how-to-see-code-of-method-which-marked-as-methodimploptions-internalcall):

FCIMPL3(Object*, COMString::ReplaceString, StringObject* thisRefUNSAFE, 
          StringObject* oldValueUNSAFE, StringObject* newValueUNSAFE)
{

   // unnecessary code ommited
      while (((index=COMStringBuffer::LocalIndexOfString(thisBuffer,oldBuffer,
             thisLength,oldLength,index))>-1) && (index<=endIndex-oldLength))
    {
        replaceIndex[replaceCount++] = index;
        index+=oldLength;
    }

    if (replaceCount != 0)
    {
        //Calculate the new length of the string and ensure that we have 
        // sufficent room.
        INT64 retValBuffLength = thisLength - 
            ((oldLength - newLength) * (INT64)replaceCount);

        gc.retValString = COMString::NewString((INT32)retValBuffLength);
     // unnecessary code ommited
    }
}

as you can see, retValBuffLength is calculated, which knows the amount of replaceCount's. The real implementation can be a bit different for .NET 4.0(SSCLI 4.0 is not released), but I assure you it's not doing anything silly :-).

I was wondering if there is a better, more efficient way of replacing all occurrences of a specific character/substring in a string, that does not additionally create an string.

Yes. Reusable StringBuilder that has capacity of ~2000 characters. Avoid any memory allocation. This is only true if the the replacement lengths are equal, and can get you a nice performance gain if you're in tight loop.

Before writing anything, run benchmarks with big files, and see if the performance is enough for you. If performance is enough - don't do anything.

Community
  • 1
  • 1
Erti-Chris Eelmaa
  • 25,338
  • 6
  • 61
  • 78
  • 1
    @Alovchin, yes, discovered it myself too few hours ago. It's only 2.0, but definitely gives you nice idea what's going on :-) – Erti-Chris Eelmaa Oct 10 '14 at 12:34
  • @ChrisEelmaa How did you find [`String.ReplaceInternal` method](http://referencesource.microsoft.com/mscorlib/R/35ab9efe11757286.html) calls this code on CLI 2.0? – Soner Gönül Oct 10 '14 at 12:45
  • @SonerGönül: edited my post & added clarifications. As of right now, the only chance to see `String.ReplaceInternal` would be to disassemble your `mscorlib.dll`. SSCLI2.0 is good enough though to argue about this though. `grepWin` is your friend ;) – Erti-Chris Eelmaa Oct 10 '14 at 13:30
5

Well, I'm not a .NET development team member (unfortunately), but I'll try to answer your question.

Microsoft has a great site of .NET Reference Source code, and according to it, String.Replace calls an external method that does the job. I wouldn't argue about how it is implemented, but there's a small comment to this method that may answer your question:

// This method contains the same functionality as StringBuilder Replace. The only difference is that
// a new String has to be allocated since Strings are immutable

Now, if we'll follow to StringBuilder.Replace implementation, we'll see what it actually does inside.

A little more on a string objects:

Although String is immutable in .NET, this is not some kind of limitation, it's a contract. String is actually a reference type, and what it includes is the length of the actual string + the buffer of characters. You can actually get an unsafe pointer to this buffer and change it "on the fly", but I wouldn't recommend doing this.

Now, the StringBuilder class also holds a character array, and when you pass the string to its constructor it actually copies the string's buffer to his own (see Reference Source). What it doesn't have, though, is the contract of immutability, so when you modify a string using StringBuilder you are actually working with the char array. Note that when you call ToString() on a StringBuilder, it creates a new "immutable" string any copies his buffer there.

So, if you need a fast and memory efficient way to make changes in a string, StringBuilder is definitely your choice. Especially regarding that Microsoft explicitly recommends to use StringBuilder if you "perform repeated modifications to a string".

Alovchin
  • 663
  • 3
  • 9
  • The contract for `String.Replace` does not require that the implementation avoid the creation of unnecessary intermediate `String` objects, but it is unlikely that such an implementation would be used when it is so easily avoided. – Sam Harwell Oct 10 '14 at 12:08
  • So I have almost the same answer as you and I answer before you... you get an up vote and I get a down vote..... what gives?? – kjbartel Oct 10 '14 at 12:12
  • @kjbartel: in what way is you answer even similar to this? You say that it always creates a new string. But OP asked if it creates a new string for every occurrence of the string that should be replaced, not once per `Replace`-call. This tries to find a source where it is documented how `String.Replace` is actually implemented. The comment suggests that only one string is created. – Tim Schmelter Oct 10 '14 at 12:17
  • @SamHarwell I wouldn't argue about the actual implementation because it well might be implemented in native code, but it definitely doesn't create new intermediate strings. Actually Microsoft itself [recommends](http://msdn.microsoft.com/en-us/library/2839d5h5) to use StringBuilder if you _"perform repeated modifications to a string"_. – Alovchin Oct 10 '14 at 12:17
0

I haven't found any sources but i strongly doubt that the implementation creates always new strings. I'd implement it also with a StringBuilder internally. Then String.Replace is absolutely fine if you want to replace once a huge string. But if you have to replace it many times you should consider to use StringBuilder.Replace because every call of Replace creates a new string.

So you can use StringBuilder.Replace since you're already using a StringBuilder.

Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Thanks, Well it turns out my question is a [XY problem](http://meta.stackexchange.com/q/66377/262588), and you have given a nice tip to solve X (efficient replacing). But I also would like to know the answer for Y too (if replacing multiple occurrences creates multiple strings). – sampathsris Oct 10 '14 at 12:08
  • 1
    @Krumia: i haven't found any sources but i strongly doubt that the implementation creates always new strings. I'd implement it also with a StringBuilder internally. Then `String.Replace` is absolutely fine if you want to replace once a huge string. But if you have to replace it many times you should consider to use `StringBuilder.Replace` because every call of `Replace` creates a new string (i'll add this comment to my answer). – Tim Schmelter Oct 10 '14 at 12:10
0

There is no string method for that. You are own your own. But you can try something like this:

oldFormat="dd/mm/yyyy";

string[] dt = oldFormat.Split('/');
string newFormat = string.Format("{0}{1}/{2}", dt[0], dt[1], dt[2]);

or

StringBuilder sb = new StringBuilder(dt[0]);
sb.AppendFormat("{0}/{1}", dt[1], dt[2]);
eakgul
  • 3,658
  • 21
  • 33