Assuming - as per Matt Burland's answer -
that the time cost of creating a string of length n by one of the given algorithms is
dominated by the number of characters copied,
the observed timings may be largely explained by calculating this for both algorithms.
This yields O(n2) and O(n log n) and, for length 10,000, a ratio of 348:1. The algorithm may be improved to O(n) in Java, but apparently not in .NET.
Cost of the improved algorithm
Inspection of the improved algorithm shows that the number c(n) of characters copied obeys the following recurrence relation:
c(0) = 0
c(1) = 1
c(n) = c(⌊n/2⌋) + c(⌈n/2⌉) + n
This may be solved to yield
c(2k + a) = (k + 1)2k + (k + 3)a
where k and a are chosen so that n = 2k + a , a < 2k ; this is readily verified by complete induction.
This is O(k 2k), i.e. O(n log2n), i.e. O(n log n),
Explanation: Comparison of costs
The original algorithm clearly copies n(n+1)/2 characters, and is thus O(n2).
The revised algorithm clearly copies far fewer characters;
for the given 10,000 character string:
c(10000) =
c(213 + 1808) =
(13+1) * 8192 + 16 * 1808 =
143,616
while the original algorithm copies 50,005,000 characters, a ratio of approximately 1 : 348,
consistent to well within an order of magnitude with the observed ratio of 1:250.
The imperfect match does suggest that other factors such as memory management may be significant.
Further optimisation
Given that the string is filled with a single character,
and assuming that String.Substring
does not make a copy of the string,
which is true in Java but not .NET according to comparison-of-substring-operation-performance-between-net-and-java ,
we can improve on the second algorithm (without using StringBuilder
or String('5', ite)
)
by continually doubling the constructed string, adding an extra character when necessary:
private static string getStr(int p)
{
if(p == 0)
return "";
if(p == 1)
return "5";
string s = getStr ((p+1) / 2);
if( s.Length + s.Length == p )
return s + s;
else
return s + s.Substring(0, p - s.Length);
}
For the number of characters c2(n) copied by this algorithm we have
c2(n) = n + c2(⌈n/2⌉)
from which we may derive
c2(n) = 2_n_ + d(n)
where d(n) is -1 if n is a power of 2 and is otherwise the number of “internal” (i.e. neither leading nor trailing) digits equal to 0 in the binary expansion of m;
equivalently, d(n) is defined by the first matching case for m ∈ ℕ in:
d(2m) = -1
d(2 m) = d(m)
d(m) = the number of essential (non-leading) 0 binary digits in m
The expression for c2 may also be verified by complete induction and is O(n + log n), i.e. O(n).
It is fairly straightforward to remove the recursion from this algorithm.
In the OP’s case this algorithm copies
c2(10,000) = 20,000 + d(110000110101000002) = 20,006 characters
and would thus appear to be some 7 times faster.
Other remarks
- This analysis applies to creating a multiple of any string, not just
"5"
.
- The most efficient way of constructing the OP’s string is presumably
String('5', ite)
.
- If using a
StringBuilder
to build a string of known size, one may reduce allocations by using StringBuilder(capacity)
.
- This analysis applies to other environments than .NET.
- In C, allocate a buffer of the right size (including
'\0'
!), copy in the string to be repeated and then repeatedly append a copy of the filled part of the buffer until it is full.