29

What is an efficient way to trim whitespace from the end of a StringBuilder without calling ToString().Trim() and back to a new SB new StringBuilder(sb.ToString().Trim()).

Nicholas Petersen
  • 9,104
  • 7
  • 59
  • 69
  • 2
    The problem with this one, is that it uses the very subjective word "fastest" in its title. It makes it sound like a competition. – crthompson Jul 15 '14 at 23:26
  • 1
    i know that since i done it myself after a few answers on my own questions and some hours of testing but i didn't know doing it right away was "ok" – Fredou Jul 15 '14 at 23:26
  • @Bobo - 'Convert it to a string and .Trim()?' - is horribly unperformant. paqogomez - I will change to 'most performant' if that makes you feel better. See Bobo's answer, is typical (no offense), the kind of junk I found in related questions. Some of us C# devs actually care about things like not making a hundred or a thousand wasted allocs, I wish we could increase the number of devs who care about not being wasteful and slow needlessly. – Nicholas Petersen Jul 16 '14 at 00:07
  • 1
    NicholasPetersen, the fact that @Bobo did answer so simply, is because best practices in software development is about making it work, first of all. Then, if there are any related issues, you know where it counts to optimize. Otherwise, it is overkill-optimization. I say this because you look very arrogant to me because of the way you speak on here. It is not because we, C# programmers, don't care about performance. It is because we concentrate on what's important depending on the customer's requirements. We shall get performance optimization once it works, and once the customer requires it. – Will Marcouiller Jul 16 '14 at 00:17
  • And if you care that much about performance, make it C/C++, or even Assembler directly with the registries, without using any interrupts. There you'll have raw performance. – Will Marcouiller Jul 16 '14 at 00:23
  • 1
    @WillMarcouiller I did not mean to sound arrogant. I do feel us C# devs who have a reasonable care for performance and for not being wasteful get beat up all the time in *exactly* situations like this, situations where we should have received ZERO flak, but there it is, here we are being beat up again. Note: 1) it took me about 5 minutes to write this function, 2) it was an extension method that could be used for the rest of one's days, 3) in some scenarios this would be THOUSANDS of times more performant, and yet you act like I put forth some ugly micro-optimization code. – Nicholas Petersen Jul 16 '14 at 00:45
  • 3
    @NicholasPetersen Perhaps providing some metrics would move this out of the realm of subjectivity. I agree that your approach would be fast if you intend on keeping the `StringBuilder`. If you intend on discarding it and using the resultant `string` prior to needing to trim, then a `TrimEnd()` on that string will be faster. I'd be interested to see a case where your code is **thousands** of times faster than any other implementation. – Ryan Emerle Jul 16 '14 at 01:18
  • @RyanEmerle thank you for your response. Yes, the only circumstance that one would do this is when they were keeping the StringBuilder instance. Which was my case. Yes, thousands, easily, especially when you consider a trim might many times not even be needed. Even when needed, let's say your SB has 1000+ characters, it merely iterates let's say 2 back until finding a non-whitespace. Setting Length then does *nothing* in stringbuilder except internally setting the length integer. No allocs, etc. – Nicholas Petersen Jul 16 '14 at 01:27
  • "is thousands of times faster than any other implementation." - I did not say with regard to any implementation, but with regard to the ToString().TrimEnd() then back to StringBuilder again method alone. – Nicholas Petersen Jul 16 '14 at 01:32
  • Just to add a new consideration to this discussion. [Look at the accepted answer in this question](http://stackoverflow.com/questions/24710770/how-to-restrict-a-content-of-string-to-less-than-4mb-and-save-that-string-in-db/24710872?noredirect=1#comment38324557_24710872). I don't know if it applies also here but it seems a thing to be considered – Steve Jul 16 '14 at 07:50
  • Thanks Steve, lots of good StringBuilder discussions there. – Nicholas Petersen Jul 16 '14 at 08:14
  • 1
    Oh, come on! What is possibly __'opinion-based'__ in looking for the 'fastest' solution?? I'd be hard put finding a __more objective__ objective! – TaW Jul 16 '14 at 08:18
  • @NicholasPetersen: I now better get your point. I partially agree with you on the basis that many .NET programmers don't strive for performance for the reasons I mentioned above. Also is it easier not to care that much about such micro-optimization because computers hardware has improved a lot during the last 10 years. Historically, computer programmers had to take care of every single bit they needed because of the lack of resources. Then, the programmers got stuck becuase of the limits of the machines. Then, it was time for big hardware improvements, that is what happened. – Will Marcouiller Jul 16 '14 at 13:13
  • Now, younger programmers didn't know this era where computers were lacking of resources. Before you really slow down a good hardware these days, and really notice the difference on the user point of view, you have to really be neglecting memory usage and not being using some `using` blocks, etc. Depending on your needs, it might be relevant to optimize such behaviour. I agree with @RyanEmerle. It would be interesting to see the code of two different unit tests which demonstrate the delta between the time required for both tries. other community users could then try it at home! ;) – Will Marcouiller Jul 16 '14 at 13:18
  • 1
    @WillMarcouiller thanks for the nice comments and for the historical perspective. – Nicholas Petersen Jul 16 '14 at 15:38
  • 1
    @NicholasPetersen you didn't specify at first that you wanted a SB forever. My comment is still valid for people who want to get a string as their end result. – Bobo Jul 16 '14 at 16:05
  • @Bobo okay, but isn't that a misreading on your part? Where did I ever say anything about converting it to a string? Also, even though this isn't the use case I had in mind, even when converting it to a string, it is still twice the waste to convert sb.ToString().TrimEnd(). Why? Because that ultimately creates *two* separate strings (because string functions like Trim return a new instance), whereas if you did sb.TrimEnd().ToString(), you took care of the very minor trim op (often just a few characters) at the SB stage. – Nicholas Petersen Jul 16 '14 at 17:53
  • Sure, I guess it was a misreading, but that is why having a clear question (like it is now after you edited it) is better. And I actually do like your extension, it seems very useful in these cases. But you didn't have to instantly turn rude and assume we are all idiots if we don't agree with you right away. Especially since for most people, the amount of "waste" without using your extension is minimal and optimizing it could be considered overkill. – Bobo Jul 16 '14 at 18:44
  • @Bobo Sounds good Bobo, I apologize for getting rude. – Nicholas Petersen Jul 16 '14 at 19:09

7 Answers7

54

The following is an extension method, so you can call it like this:

sb.TrimEnd();

Also, it returns the SB instance, allowing you to chain other calls (sb.TrimEnd().AppendLine()).

public static StringBuilder TrimEnd(this StringBuilder sb)
{
    if (sb == null || sb.Length == 0) return sb;

    int i = sb.Length - 1;

    for (; i >= 0; i--)
        if (!char.IsWhiteSpace(sb[i]))
            break;

    if (i < sb.Length - 1)
        sb.Length = i + 1;

    return sb;
}

Notes:

  1. If Null or Empty, returns.

  2. If no Trim is actually needed, we're talking a very quick return time, with probably the most expensive call being the single call to char.IsWhiteSpace. So practically zero expense to call TrimEnd when not needed, as opposed to these ToString().Trim() back to SB routes.

  3. Else, the most expensive thing, if trim is needed, is the multiple calls to char.IsWhiteSpace (breaks on first non-whitespace char). Of course, the loop iterates backwards; if all are whitespace you'll end up with a SB.Length of 0.

  4. If whitespaces were encountered, the i index is kept outside the loop which allows us to cut the Length appropriately with it. In StringBuilder, this is incredibly performant, it simply sets an internal length integer (the internal char[] is kept the same internal length).

Update: See excellent notes by Ryan Emerle as follows, which correct some of my misunderstandings (the internal workings of SB are a little more complicated than I made it out to be):

The StringBuilder is technically a linked list of blocks of char[] so we don't end up in the LOH. Adjusting the length isn't technically as simple as changing the end index because if you move into a different chunk the Capacity must be maintained, so a new chunk may need to be allocated. Nevertheless, you only set the Length property at the end, so this seems like a great solution. Relevant details from Eric Lippert: https://stackoverflow.com/a/6524401/62195

Also, see this nice article discussing the .NET 4.0 new StringBuilder implementation: http://1024strongoxen.blogspot.com/2010/02/net-40-stringbuilder-implementation.html

Update: Following illustrates what happens when a StringBuilder Length is altered (the only real operation done to the SB here, and that only when needed):

StringBuilder sb = new StringBuilder("cool  \t \r\n ");

sb.Capacity.Print(); // 16
sb.Length.Print();  // 11
        
sb.TrimEnd();

sb.Capacity.Print(); // 16
sb.Length.Print();  // 4 

You can see the internal array (m_ChunkChars) stays the same size after changing the Length, and in fact, you can see in the debugger it doesn't even overwrite the (in this case whitespace) characters. They are orphaned is all.

H. Pauwelyn
  • 13,575
  • 26
  • 81
  • 144
Nicholas Petersen
  • 9,104
  • 7
  • 59
  • 69
  • 1
    Would you consider explaining that code, and why it meets the requirements of the question? That way it can help future readers learn. – Andrew Barber Jul 16 '14 at 00:50
  • Sure, will put in the body. – Nicholas Petersen Jul 16 '14 at 02:00
  • Cool. Just for clarity; does this method avoid rebuilding the output string each character? The indexer just hits the internal buffer, right? – Andrew Barber Jul 16 '14 at 04:02
  • 1
    Right, the indexer accesses the internal char array (see sb.Capacity for it's size); StringBuilder is really just a glorified char[] with a Length field, which acts as a pointer to where to add to the internal array. Importantly, the only operation this method does to the SB is alter the Length field if needed, but this does *not* make the internal char[] cut in size (it only grows). If so, that would require a new array alloc & copy, which would defeat the purpose. Thus: a full Trim wouldn't make sense (and it would be so rarely needed anyways), bec that requires altering the internal array. – Nicholas Petersen Jul 16 '14 at 06:05
  • Excellent, and good edit. Might add a relevant question about in the question itself, too. That'll be something on many people's minds – Andrew Barber Jul 16 '14 at 12:35
  • 2
    The `StringBuilder` is technically a linked list of blocks of `char[]` so we don't end up in the [LOH](http://msdn.microsoft.com/en-us/magazine/cc534993.aspx). Adjusting the length isn't *technically* as simple as changing the end index because if you move into a different chunk the `Capacity` must be maintained, so a new chunk may need to be allocated. Nevertheless, you only set the `Length` property at the end, so this seems like a great solution. – Ryan Emerle Jul 16 '14 at 14:27
  • @RyanEmerle Adjusting the `Length` isn't a huge deal because conceptually there's no real way of avoiding actually removing those characters no matter what you do. The somewhat more concerning issue is the indexer, which is used an unknown number of times and that isn't as easy as just getting the item from the array, as it needs to find the correct chunk to index, making it somewhat more work. – Servy Jul 16 '14 at 16:17
  • If part of @RyanEmerle 's concern was that the length was only being set at the end, does that raise concern if you want to do stuff like append to the SB afterwards? Does this depend somewhat on the framework version? – Panzercrisis Jul 19 '16 at 13:06
  • I'm having a little bit of trouble following the part about moving into another chunk. I understand that it probably involves appending into the SB more characters than will fit into the pre-existing arrays, either the orphaned ones or the last one actually being used, but does that mean it won't simply overwrite the orphaned ones to save space and time? – Panzercrisis Jul 19 '16 at 13:08
  • 2
    We had problems serializing a 270 MB JSON string. After switching to this method,the time needed, for a Release build, went from 22 minutes to 12 seconds. – cskwg Apr 03 '19 at 20:14
  • I really like this answer! I expanded on the idea to create a TrimStart()-variant as well, and encapsulated it into a StringBuilderExtensions partial class. Available as a freely available gist here: https://gist.github.com/ST-Emanuel/a079845848369e1f78eb2931f39e831c – Emanuel Strömgren Sep 10 '19 at 09:22
  • 1
    Thanks @EmanuelStrömgren! I would recommend against a TrimStart because I don’t believe `sb.Remove` is performant, though I could stand to be corrected. It seems a better way is just to wait till the sb has to be serialized to a string and to trim it at that time, ie when sb.ToString is called, as it allows a start index to be passed in. I wrote an extension method for this here: https://github.com/copernicus365/DotNetXtensions/blob/master/DotNetXtensions/src/XStringBuilder.cs#L314 – Nicholas Petersen Sep 10 '19 at 09:38
  • That is definitely true, I must argue though that the use of TrimStart is at times handy to have while still modifying the string. – Emanuel Strömgren Sep 10 '19 at 09:59
3

You can try this:

StringBuilder b = new StringBuilder();
b.Append("some words");
b.Append(" to test   ");

int count = 0;
for (int i = b.Length - 1; i >= 0; i--)
{
    if (b[i] == ' ')
        count++;
    else
        break;
}

b.Remove(b.Length - count, count);
string result = b.ToString();

It will just iterate through the end while there are whitespaces then breaking out of the loop.

Or even like this:

StringBuilder b = new StringBuilder();
b.Append("some words");
b.Append(" to test   ");

do
{
    if(char.IsWhiteSpace(b[b.Length - 1]))
    {
         b.Remove(b.Length - 1,1);
    }
}
while(char.IsWhiteSpace(b[b.Length - 1]));

string get = b.ToString();
terrybozzio
  • 4,424
  • 1
  • 19
  • 25
1
public static class StringBuilderExtensions
{
    public static StringBuilder Trim(this StringBuilder builder)
    {
        if (builder.Length == 0)
            return builder;

        var count = 0;
        for (var i = 0; i < builder.Length; i++)
        {
            if (!char.IsWhiteSpace(builder[i]))
                break;
            count++;
        }

        if (count > 0)
        {
            builder.Remove(0, count);
            count = 0;
        }

        for (var i = builder.Length - 1; i >= 0; i--)
        {
            if (!char.IsWhiteSpace(builder[i]))
                break;
            count++;
        }

        if (count > 0)
            builder.Remove(builder.Length - count, count);

        return builder;
    }
}
Smagin Alexey
  • 305
  • 2
  • 6
  • Nice idea, the problem is, it's not performant I believe trimming from the beginning. It seems like a better idea therefore is to have a final trim operation when getting the string. So imagine if your method returned a string let's say called `TrimToString`, if the beginning needs trimmed you use the ToString overload to set the beginning index to start getting the string from (and trim the end first the normal way). I've been using this for a while, see new post with it in a min. – Nicholas Petersen Apr 18 '19 at 19:54
  • Removing whitespace from end - is more perfomance, if i will set Length (like in your example). But in the start you convert StringBuilder to string, but i want return StringBuilder, it's a reason why i use Remove. If you want return string, you can make your method more perfomance - remember valid start index, remember valid end index (not set Length and don't call remove), in the ending of method call ToString(startValidIndex, Length - validStartIndex - validEndIndex) – Smagin Alexey Apr 20 '19 at 06:49
  • I took this for TrimStart and Petersen's for TrimEnd - i think this is best one can do really – Boppity Bop Aug 24 '20 at 14:52
1

To do a full trim, it's not performant / advisable to do that on the StringBuilder level, but rather at ToString time, like with this TrimToString implementation:

    public static string TrimToString(this StringBuilder sb)
    {
        if (sb == null) return null;

        sb.TrimEnd(); // handles nulle and is very inexpensive, unlike trimstart

        if (sb.Length > 0 && char.IsWhiteSpace(sb[0])) {
            for (int i = 0; i < sb.Length; i++)
                if (!char.IsWhiteSpace(sb[i]))
                    return sb.ToString(i);
            return ""; // shouldn't reach here, bec TrimEnd should have caught full whitespace strings, but ...
        }

        return sb.ToString();
    }
Nicholas Petersen
  • 9,104
  • 7
  • 59
  • 69
1

I extended Nicholas Petersen version for optional additional chars:

/// <summary>
/// Trims the end of the StingBuilder Content. On Default only the white space char is truncated.
/// </summary>
/// <param name="pTrimChars">Array of additional chars to be truncated.</param>
/// <returns></returns>
public static StringBuilder TrimEnd(this StringBuilder pStringBuilder, char[] pTrimChars = null)
{
    if (pStringBuilder == null || pStringBuilder.Length == 0)
        return pStringBuilder;

    int i = pStringBuilder.Length - 1;

    var lTrimChars = new HashSet<char>();
    if (pTrimChars != null)
        lTrimChars = pTrimChars.ToHashSet();

    for (; i >= 0; i--)
    {
        var lChar = pStringBuilder[i];
        if ((char.IsWhiteSpace(lChar) == false) && (lTrimChars.Contains(lChar) == false))
            break;
    }

    if (i < pStringBuilder.Length - 1)
        pStringBuilder.Length = i + 1;

    return pStringBuilder;
}

Edit: After Nicholas Petersen suggestion:

/// <summary>
/// Trims the end of the StingBuilder Content. On Default only the white space char is truncated.
/// </summary>
/// <param name="pTrimChars">Array of additional chars to be truncated. A little bit more efficient than using char[]</param>
/// <returns></returns>
public static StringBuilder TrimEnd(this StringBuilder pStringBuilder, HashSet<char> pTrimChars = null)
{
    if (pStringBuilder == null || pStringBuilder.Length == 0)
        return pStringBuilder;

    int i = pStringBuilder.Length - 1;

    for (; i >= 0; i--)
    {
        var lChar = pStringBuilder[i];

        if (pTrimChars == null)
        {
            if (char.IsWhiteSpace(lChar) == false)
                break;
        }
        else if ((char.IsWhiteSpace(lChar) == false) && (pTrimChars.Contains(lChar) == false))
            break;
    }

    if (i < pStringBuilder.Length - 1)
        pStringBuilder.Length = i + 1;

    return pStringBuilder;
}
Dexit
  • 11
  • 2
0

If you know how many whitespaces you want to remove, could try using StringBuilder.Remove(int startIndex, int length), doesn't need create an extension method.

Hope it will help!

jin bai
  • 101
  • 3
-1
StringBuilder myString = new StringBuilder("This is Trim test ");

if (myString[myString.Length - 1].ToString() == " ")
{              
    myString = myString.Remove(myString.Length - 1, 1);
}
shA.t
  • 16,580
  • 5
  • 54
  • 111
  • 1
    1) This does not trim multiple trailing whitespaces, 2) the only whitespace it checks for is a space, 3) no need to turn the char in line 1 into a string, just compare it as a char, if you were going that route (` == ' '`), 4) I would have to check how Remove works when at the end of the SB, but certainly it's not going to be faster than just changing the Length, as others have suggested below. – Nicholas Petersen Aug 20 '18 at 16:12