performance while doing string concatenation - algorithm string strings c#

Question

I was using the following code to append strings

string res = string.Empty;
int ite = 100000;
for(int i= 0; i < ite; i++)
{
    res += "5";
}

It was taking a lot of time, so I later changed the code to

string res = string.Empty;
int ite = 100000;
res = getStr(ite / 2) + getStr(ite - (ite / 2)); 

//body of getStr method
private static string getStr(int p)
{
    if (p == 1)
        return "5";
    else if (p == 0)
        return string.Empty;
    string r1 = getStr(p / 2); //recursive
    string r2 = getStr(p - (p / 2)); //recursive  
    return (r1 + r2);
}

which in my opinion does nothing actually as the number of times strings are concatenated is approximately the same as in previous approach.

But using this approach there was significant improvement in performance as the code which was taking around 2500 ms (on my machine) was now taking 10 ms.

I ran a profiler over cpu time and couldn't arrive at an understanding as to why there is an improvement in performance. Can anyone please explain this.

Note: I am intentionally not using StringBuilder, in order to understand the above.

Matt Burland · Accepted Answer · 2015-07-23T17:46:22.050

You need to think about why string concatenation is slow. Strings are immutable, so when you do:

someString+= "5";

You have to copy the entire contents of someString and to another string that is one larger and then copy in the 5 part. If you think about it, this gets slower and slower the longer the string gets.

With your recursive function you are doing a divide and conquer strategy to help to minimize the number of big string concatenations you need. For example, if you had a length of 8, in the first case you'd be doing:

"5" + "5" 
"55" + "5"
"555" + "5"
"5555" + "5"
"55555" + "5"
"555555" + "5"
"5555555" + "5"    // 7 total concatenations

In your recursive model you are doing:

"5" + "5"         // Four times
"55" + "55"       // twice
"5555" + "5555"   // once

So you are doing less big concatenations.

And, of course, I assume the OP knows this from their comment, but for anybody else; if you need to concatenate any non-trivial number of strings, use StringBuilder as it is optimized for building strings by Appending them together.

score 1 · Answer 2 · edited May 23 '17 at 11:43

Assuming - as per Matt Burland's answer - that the time cost of creating a string of length n by one of the given algorithms is dominated by the number of characters copied, the observed timings may be largely explained by calculating this for both algorithms. This yields O(n²) and O(n log n) and, for length 10,000, a ratio of 348:1. The algorithm may be improved to O(n) in Java, but apparently not in .NET.

Cost of the improved algorithm

Inspection of the improved algorithm shows that the number c(n) of characters copied obeys the following recurrence relation:

c(0) = 0
c(1) = 1
c(n) = c(⌊n/2⌋) + c(⌈n/2⌉) + n

This may be solved to yield

c(2^k + a) = (k + 1)2^k + (k + 3)a

where k and a are chosen so that n = 2^k + a , a < 2^k ; this is readily verified by complete induction. This is O(k 2^k), i.e. O(n log₂n), i.e. O(n log n),

Explanation: Comparison of costs

The original algorithm clearly copies n(n+1)/2 characters, and is thus O(n²).

The revised algorithm clearly copies far fewer characters; for the given 10,000 character string:

c(10000) =
c(2¹³ + 1808) =
(13+1) * 8192 + 16 * 1808 =
143,616

while the original algorithm copies 50,005,000 characters, a ratio of approximately 1 : 348, consistent to well within an order of magnitude with the observed ratio of 1:250. The imperfect match does suggest that other factors such as memory management may be significant.

Further optimisation

Given that the string is filled with a single character, and assuming that String.Substring does not make a copy of the string, which is true in Java but not .NET according to comparison-of-substring-operation-performance-between-net-and-java , we can improve on the second algorithm (without using StringBuilder or String('5', ite)) by continually doubling the constructed string, adding an extra character when necessary:

private static string getStr(int p)
{
    if(p == 0)
        return "";
    if(p == 1)
        return "5";
    string s = getStr ((p+1) / 2);
    if( s.Length + s.Length == p )
        return s + s;
    else
        return s + s.Substring(0, p - s.Length);
}

For the number of characters c₂(n) copied by this algorithm we have

c₂(n) = n + c₂(⌈n/2⌉)

from which we may derive

c₂(n) = 2_n_ + d(n)

where d(n) is -1 if n is a power of 2 and is otherwise the number of “internal” (i.e. neither leading nor trailing) digits equal to 0 in the binary expansion of m; equivalently, d(n) is defined by the first matching case for m ∈ ℕ in:

d(2^m) = -1
d(2 m) = d(m)
d(m) = the number of essential (non-leading) 0 binary digits in m

The expression for c₂ may also be verified by complete induction and is O(n + log n), i.e. O(n).

It is fairly straightforward to remove the recursion from this algorithm.

In the OP’s case this algorithm copies c₂(10,000) = 20,000 + d(11000011010100000₂) = 20,006 characters and would thus appear to be some 7 times faster.

Other remarks

This analysis applies to creating a multiple of any string, not just "5".
The most efficient way of constructing the OP’s string is presumably String('5', ite).
If using a StringBuilder to build a string of known size, one may reduce allocations by using StringBuilder(capacity).
This analysis applies to other environments than .NET.
In C, allocate a buffer of the right size (including '\0'!), copy in the string to be repeated and then repeatedly append a copy of the filled part of the buffer until it is full.

performance while doing string concatenation - algorithm string strings c#

2 Answers2

Cost of the improved algorithm

Explanation: Comparison of costs

Further optimisation

Other remarks