0

I have a string and I want to insert a specific character, \n for example, at many indices, which I have stored in a List. Is there an efficient way of modifying the string without first splitting it? I care more about time than space in this case.

If this helps, the function is within a Singleton class, and is being used to reformat the string which will then be rendered to a game scene upon return (this for a game in Unity).

DonCarleone
  • 544
  • 11
  • 20
  • 3
    Yes? A `StringBuilder` and then you do it char-by-char. And I don't see why you should split it. – xanatos Dec 28 '20 at 22:38
  • Would a StringBuilder be the most efficient way? – DonCarleone Dec 28 '20 at 22:39
  • Strings are immutable and .NET unfortunately has an anaemic string library - your only option is a StringBuilder, but why not use a linked-list of substrings? – Dai Dec 28 '20 at 22:39
  • What will you do with the resulting string? For example, if you then write to the stream, it makes sense to write to it immediately, without an intermediate buffer in the form of a StringBuilder. – Alexander Petrov Dec 28 '20 at 22:39
  • I totally agree with using a StringBuilder and writing the pieces of the string and your insert-character into (not necessarily char by char if you are using SubString) – Tobias Raphael Dieckmann Dec 28 '20 at 22:40
  • @AlexanderPetrov This is in a function. Basically "beautifying" the string then returning it. – DonCarleone Dec 28 '20 at 22:42
  • It seems like you should be using an array of chars (instead of a string) that can be cloned per usage and updated, then converted to a new string. – David L Dec 28 '20 at 22:42
  • @DavidL Your method would work, but because I am given a string to start with, I would have to convert it first to a character array, to the necessary operations and then convert it back to a string. – DonCarleone Dec 28 '20 at 22:46
  • You have to convert it to something at some point regardless. A string is immutable. You could take it a step further and use a Span and save yourself intermediary allocations. It isn't clear how your strings are loaded, modified, and then used. If you are storing them for long periods of time and then performing multiple projections, constantly looping with a string builder might be suboptimal. – David L Dec 28 '20 at 22:47

2 Answers2

3

A first version could be this:

public static string InsertStringAt(string str, int[] indexes, string insert)
{
    if (indexes.Length == 0 || insert.Length == 0)
    {
        return str;
    }

    var sb = new StringBuilder(str.Length + indexes.Length * insert.Length);

    int ixIndexes = 0;

    // We NEED ordered indexes
    // We don't order "in place" indexes because it is bad
    // modifying input parameters
    var indexes2 = indexes.OrderBy(x => x).ToArray();

    int i = 0;

    for (; i < str.Length; i++)
    {
        while (ixIndexes < indexes2.Length && i == indexes2[ixIndexes])
        {
            if (indexes2[ixIndexes] < 0)
            {
                // index < 0
                throw new ArgumentOutOfRangeException(nameof(indexes));
            }

            sb.Append(insert);
            ixIndexes++;
        }

        sb.Append(str[i]);
    }

    while (ixIndexes < indexes2.Length && i == indexes2[ixIndexes])
    {
        if (indexes2[ixIndexes] < 0)
        {
            // index < 0
            throw new ArgumentOutOfRangeException(nameof(indexes));
        }

        sb.Append(insert);
        ixIndexes++;
    }

    if (ixIndexes != indexes2.Length)
    {
        // Some indexes were > indexes.Length
        throw new ArgumentOutOfRangeException(nameof(indexes));
    }

    return sb.ToString();
}

Note that there is a possible optimization, but the optimization will render the code more complex. Instead of adding the chars one at a time, we could add all the substrings of str between the indexes.

Second version... In the end it wasn't more complex:

public static string InsertStringAt(string str, int[] indexes, string insert)
{
    if (indexes.Length == 0 || insert.Length == 0)
    {
        return str;
    }

    var indexes2 = indexes.OrderBy(x => x).ToArray();

    var sb = new StringBuilder(str.Length + indexes.Length * insert.Length);

    int j = 0;

    for (int i = 0; i < indexes2.Length; i++)
    {
        if (indexes2[i] < 0)
        {
            throw new ArgumentOutOfRangeException(nameof(indexes2));
        }

        if (indexes2[i] > str.Length)
        {
            throw new ArgumentOutOfRangeException(nameof(indexes2));
        }

        sb.Append(str, j, indexes2[i] - j);
        sb.Append(insert);

        j = indexes2[i];
    }

    sb.Append(str, j, str.Length - j);

    return sb.ToString();
}
xanatos
  • 109,618
  • 12
  • 197
  • 280
  • Note in general `ToList` is preferable to `ToArray` unless you really need an array. – NetMage Dec 28 '20 at 22:54
  • 2
    @NetMage, no, it isn't in this situation. The size of the array is known and will not be resized so there is absolutely no reason to use a list, especially because it is not part of a public api but an internal implementation detail. – David L Dec 28 '20 at 22:57
  • @DavidL The use of `ToArray` on an `OrderBy` will generate an extra allocation every time it is called for no purpose. – NetMage Dec 28 '20 at 22:59
  • 2
    As will `ToList`. The entire point of the code is to copy the input array to a separate collection. Either ToList or ToArray would do that, but in this case the array returned from ToArray will be more efficient than ToList since there is no possibility of resizing the underlying array storage. – David L Dec 28 '20 at 22:59
  • Some years ago I had done some microbenchmark... For small `IEnumerable` `ToArray` was FASTER than `ToList`, even while the `ToArray` did two allocations (one to an extensible scratch buffer and the other to the final array). In .NET Core they added some optimizations, so that many LINQ methods "save" the length of the collection (a five elements OrderByed will by still five elements). With that optimization I imagine the ToArray won't beed a scratch buffer. If I wanted to microoptimize I would first to a check to see if the array need ordering, and if yes I would `Clone()` and `Array.Sort()` – xanatos Dec 28 '20 at 23:03
  • @DavidL In the general case, `ToArray` requires one additional allocation after the result is built to downsize to the exact final array size. – NetMage Dec 28 '20 at 23:04
  • @NetMage As I've said, did some microbenchmarks... The `List<>` is slow(er) than an optimized scratch buffer. – xanatos Dec 28 '20 at 23:04
  • @xanatos I appreciate the optimizations in .Net Core for countable `IEnumerable` to make `ToArray` more efficient, but that seems like an implementation decision in the framework you shouldn't count on? – NetMage Dec 28 '20 at 23:05
  • @NetMage But I did make benchmarks on .NET Framework :-) – xanatos Dec 28 '20 at 23:06
  • @NetMage Even though this is suggested by the accepted answer to this question (https://stackoverflow.com/questions/1105990/is-it-better-to-call-tolist-or-toarray-in-linq-queries), benchmarks show that `ToArray` is more performant: https://stackoverflow.com/a/60103725/689923 Also it's a bit less mutable than a list, so I personally find it a cleaner data structure for APIs. So unless you really need a list with dynamic length, I would generally prefer an array. – Mo B. Dec 28 '20 at 23:06
  • @MoB. I think for APIs `Span` is the solution, its limitations aside. – NetMage Dec 28 '20 at 23:07
  • @xanatos And Mono, and macOS, and ARM, ... :) – NetMage Dec 28 '20 at 23:08
  • I would suggest naming the method `InsertStringAt` as well. – NetMage Dec 28 '20 at 23:08
  • @NetMage, I apologize, I interpreted your comment as referring to the resulting new allocation, not any additional intermediary allocations. I'd be curious to see what versions of .net framework contain the intermediary allocation. In .Net 4.8, I don't see the downsizing allocation although I could certainly be missing it: https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,998 vs https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,329 – David L Dec 28 '20 at 23:15
  • @DavidL https://github.com/microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs#L944 and then https://github.com/microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs#L2680 – xanatos Dec 28 '20 at 23:17
  • @NetMage Concur with the naming. – xanatos Dec 28 '20 at 23:17
  • @xanatos ahh, perfect, much appreciated. – David L Dec 28 '20 at 23:18
  • @xanatos In your new version, why not just test the last `indexes2` against `str.Length` instead of each one as you go? BTW, really like the three argument `Append` to `StringBuilder`. – NetMage Dec 28 '20 at 23:25
  • Thanks for the solution :) This solution can also be modified to work for a dict mapping indices to strings-to-insert. – DonCarleone Dec 28 '20 at 23:41
2

I suggest a string extension method that copies slices between positions into a StringBuilder and inserts the char at each position:

public static class StringExt {
    public static string InsertCharAt(this string src, char ins, IEnumerable<int> poss) {
        var orderedPoss = poss.ToList();
        orderedPoss.Sort();

        var ans = new StringBuilder(src.Length+orderedPoss.Count);

        var srcSpan = src.AsSpan();
        var beginPos = 0;
        foreach (var insPos in orderedPoss) {
            ans.Append(srcSpan.Slice(beginPos, insPos-beginPos));
            ans.Append(ins);
            beginPos = insPos;
        }
        return ans.ToString();
    }
}

NOTE: If you don't have Span, the three argument StringBuilder.Append method can be used instead (as @xanatos did).

PS: Per @pingfloydx33 a single allocation version using String.Create:

public static string InsertCharAt2(this string src, char ins, IEnumerable<int> poss) {
    var orderedPoss = poss.ToList();
    orderedPoss.Sort();

    return String.Create(src.Length + orderedPoss.Count, new { }, (ans, _) => {
        var srcSpan = src.AsSpan();
        var beginPos = 0;
        var writePos = 0;
        foreach (var insPos in orderedPoss) {
            var subLen = insPos-beginPos;
            srcSpan.Slice(beginPos, subLen).CopyTo(ans.Slice(writePos));
            writePos += subLen;
            ans[writePos++] = ins;
            beginPos = insPos;
        }
    });
NetMage
  • 26,163
  • 3
  • 34
  • 55
  • With a little extra manual counting, could also use string.Create... Though not sure if that'd be any more performant as you'd have to keep checking the indices. Just spitballing – pinkfloydx33 Dec 28 '20 at 23:39
  • @pinkfloydx33 It does remove unnecessary allocations, and `StringBuilder.Append` has to track the write position as well, so not sure it has more overhead. `Span.CopyTo` is range checked, so that may be good enough. – NetMage Dec 29 '20 at 01:02