10

I had an interview question that asked me for my 'feedback' on a piece of code a junior programmer wrote. They hinted there may be a problem and said it will be used heavily on large strings.

public string ReverseString(string sz)
{
    string result = string.Empty;
    for(int i = sz.Length-1; i>=0; i--)
    {
      result += sz[i]
    }
    return result;
}

I couldn't spot it. I saw no problems whatsoever. In hindsight I could have said the user should resize but it looks like C# doesn't have a resize (i am a C++ guy).

I ended up writing things like use an iterator if its possible, [x] in containers could not be random access so it may be slow. and misc things. But I definitely said I never had to optimize C# code so my thinking may have not failed me on the interview.

I wanted to know, what is the problem with this code, do you guys see it?

-edit-

I changed this into a wiki because there can be several right answers. Also i am so glad i explicitly said i never had to optimize a C# program and mentioned the misc other things. Oops. I always thought C# didnt have any performance problems with these type of things. oops.

  • Keep in mind this is more of a puzzle than an actual problem. In real life, you can generally just reverse the string in the manner most convenient, and move on. Only come back after you're sure it's causing performance problems (it usually won't) – Rex M Jun 17 '09 at 22:35
  • 3
    this is not real world, its job interview. – IAdapter Jun 29 '09 at 22:47

12 Answers12

58

Most importantly? That will suck performance wise - it has to create lots of strings (one per character). The simplest way is something like:

public static string Reverse(string sz) // ideal for an extension method
{
    if (string.IsNullOrEmpty(sz) || sz.Length == 1) return sz;
    char[] chars = sz.ToCharArray();
    Array.Reverse(chars);
    return new string(chars);
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
36

The problem is that string concatenations are expensive to do as strings are immutable in C#. The example given will create a new string one character longer each iteration which is very inefficient. To avoid this you should use the StringBuilder class instead like so:

public string ReverseString(string sz)
{
    var builder = new StringBuilder(sz.Length);
    for(int i = sz.Length-1; i>=0; i--)
    {
      builder.Append(sz[i]);
    }
    return builder.ToString();
}

The StringBuilder is written specifically for scenarios like this as it gives you the ability to concatenate strings without the drawback of excessive memory allocation.

You will notice I have provided the StringBuilder with an initial capacity which you don't often see. As you know the length of the result to begin with, this removes needless memory allocations.

What normally happens is it allocates an amount of memory to the StringBuilder (default 16 characters). Once the contents attempts to exceed that capacity it doubles (I think) its own capactity and carries on. This is much better than allocating memory each time as would happen with normal strings, but if you can avoid this as well it's even better.

Garry Shutler
  • 32,260
  • 12
  • 84
  • 119
  • I have nothing to do with it but consider if the person accidentally hit down then hit up. Logically it shouldnt show in recent activities but it is possible. I only thought of this because when i first got on this site (5mo ago) I tested up then down then up voting. Just to see if i was able to do it. –  Jun 17 '09 at 22:20
  • 3
    Garry: Get used to it. Many times people downvote correct answers without commenting. – Mehrdad Afshari Jun 17 '09 at 22:31
21

A few comments on the answers given so far:

  • Every single one of them (so far!) will fail on surrogate pairs and combining characters. Oh the joys of Unicode. Reversing a string isn't the same as reversing a sequence of chars.
  • I like Marc's optimisation for null, empty, and single character inputs. In particular, not only does this get the right answer quickly, but it also handles null (which none of the other answers do)
  • I originally thought that ToCharArray followed by Array.Reverse would be the fastest, but it does create one "garbage" copy.
  • The StringBuilder solution creates a single string (not char array) and manipulates that until you call ToString. There's no extra copying involved... but there's a lot more work maintaining lengths etc.

Which is the more efficient solution? Well, I'd have to benchmark it to have any idea at all - but even so that's not going to tell the whole story. Are you using this in a situation with high memory pressure, where extra garbage is a real pain? How fast is your memory vs your CPU, etc?

As ever, readability is usually king - and it doesn't get much better than Marc's answer on that front. In particular, there's no room for an off-by-one error, whereas I'd have to actually put some thought into validating the other answers. I don't like thinking. It hurts my brain, so I try not to do it very often. Using the built-in Array.Reverse sounds much better to me. (Okay, so it still fails on surrogates etc, but hey...)

Community
  • 1
  • 1
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 16
    If I ever write a language, I'm going to implement string.Reverse() just to avoid silly interview questions like this! – Garry Shutler Jun 17 '09 at 22:24
  • 3
    If you did that, they'd have to think up even sillier questions to ask people. – Matt Boehm Jun 18 '09 at 17:38
  • 1
    about "Array.Reverse sounds much better to me. (Okay, so it still fails on surrogates etc, but hey...)". What are surrogates? I believe i was once watching a video and you said reversing "Les Misérables" would get the incorrect results. However i tried it the moment you said it and it didn't (I believe this was a year ago and it was SO related. You talked about date/time and numbers as well). Although it doesn't show up i pretty much did this in a winform app using .NET 3.5 http://ideone.com/3ZzPg -edit- maybe this code is better. It says true http://ideone.com/SSNfN –  Mar 06 '12 at 15:36
  • @acidzombie24: Your code uses the single character representation of an acute e. The failure occurs if you represent it using *two* characters: a plain e and a combining character. Surrogate pairs are slightly different, but similar - that's where two UTF-16 code units are combined to create a represent Unicode code point. – Jon Skeet Mar 06 '12 at 16:28
  • 1
    @JonSkeet: What string contains two characters to represent a character? I tried to find one once before and kept my eyes open since that video but I never found a string with a character like that –  Mar 06 '12 at 16:36
  • @acidzombie24: Just because you haven't seen one doesn't mean it doesn't exist, or that it's not valid, either for surrogate pairs or for combining characters. You can't represent a Unicode code point above U+FFFF in UTF-16 without using surrogate pairs... see http://msmvps.com/blogs/jon_skeet/archive/2009/11/02/omg-ponies-aka-humanity-epic-fail.aspx for more details including the representation of "Les Misérables" with a combining character. – Jon Skeet Mar 06 '12 at 16:42
  • @JonSkeet: Thanks. Funny enough `ser.Deserialize(@"""Les Mis\u0301erables""")` the acute character was 's' instead of 'e'. When reversing it was over the e instead of s –  Mar 06 '12 at 17:11
  • @acidzombie24: Exactly. See the blog post for *why* you saw it on the wrong character... – Jon Skeet Mar 06 '12 at 17:13
7

Since strings are immutable, each += statement will create a new string by copying the string in the last step, along with the single character to form a new string. Effectively, this will be an O(n2) algorithm instead of O(n).

A faster way would be (O(n)):

// pseudocode:
static string ReverseString(string input) {
    char[] buf = new char[input.Length];
    for(int i = 0; i < buf.Length; ++i)
       buf[i] = input[input.Length - i - 1];
    return new string(buf);
}
Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • 1
    n² will be especially significant on "large strings". – Dour High Arch Jun 17 '09 at 21:50
  • 2
    This is the most common .NET gotcha that I've seen. string allocation can be a bottleneck because the temp strings can hamper GC performance. It's a particularly good interview question to test .NET experience vs "I'm a C++ programmer who read a C# book last week" – Jimmy Jun 17 '09 at 21:51
  • 1
    As a side note, a generational GC (like .NET GC) is pretty good at allocating and deallocating short lived object. – Mehrdad Afshari Jun 17 '09 at 21:53
3

You can do this in .NET 3.5 instead:

    public static string Reverse(this string s)
    {
        return new String((s.ToCharArray().Reverse()).ToArray());
    }
jasonh
  • 29,297
  • 11
  • 59
  • 61
  • 1
    (Even if it worked, it wouldn't be ideal. Enumerable.Reverse() has to build up a buffer of elements, which needs to be resized periodically. There's then the matter of iterating over it, etc. Using Array.Reverse is much more efficient. Yes, it takes a couple more lines of code - but it's better, IMO.) – Jon Skeet Jun 17 '09 at 22:26
  • 1
    Did you call ToArray on the result of Reverse, perhaps? return new String(s.ToCharArray().Reverse().ToArray()); – Jon Skeet Jun 17 '09 at 22:27
1

Better way to tackle it would be to use a StringBuilder, since it is not immutable you won't get the terrible object generation behavior that you would get above. In .net all strings are immutable, which means that the += operator there will create a new object each time it is hit. StringBuilder uses an internal buffer, so the reversal could be done in the buffer w/ no extra object allocations.

chris.w.mclean
  • 1,793
  • 14
  • 17
  • ahh, += makes a new object! That is insane. I always thought the '=' forces this to be an internal operations. Why is string allowed to update itself to point to a new string!?! –  Jun 17 '09 at 22:10
  • String isn't allowed to update itself - a string *variable* is allowed to be reassigned to refer to a different string though. – Jon Skeet Jun 17 '09 at 22:18
1

You should use the StringBuilder class to create your resulting string. A string is immutable so when you append a string in each interation of the loop, a new string has to be created, which isn't very efficient.

Charlie
  • 10,227
  • 10
  • 51
  • 92
  • 2
    Don't immediately rush to StringBuilder automatically any time there's a string issue. There may be other, simpler solutions - Marc's code is nice and elegant. – Jon Skeet Jun 17 '09 at 22:18
1

This method cuts the number of iterations in half. Rather than starting from the end, it starts from the beginning and swaps characters until it hits center. Had to convert the string to a char array because the indexer on a string has no setter.

    public string Reverse(String value)
    {
        if (String.IsNullOrEmpty(value)) throw new ArgumentNullException("value");

        char[] array = value.ToCharArray();

        for (int i = 0; i < value.Length / 2; i++)
        {
            char temp = array[i];
            array[i] = array[(array.Length - 1) - i];
            array[(array.Length - 1) - i] = temp;
        }

        return new string(array);
    }
1

I prefer something like this:

using System;
using System.Text;
namespace SpringTest3
{
    static class Extentions
    {
        static private StringBuilder ReverseStringImpl(string s, int pos, StringBuilder sb)
        {
            return (s.Length <= --pos || pos < 0) ? sb : ReverseStringImpl(s, pos, sb.Append(s[pos]));
        }

        static public string Reverse(this string s)
        {
            return ReverseStringImpl(s, s.Length, new StringBuilder()).ToString();
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("abc".Reverse());
        }
    }
}
JohnP
  • 677
  • 1
  • 9
  • 18
1

x is the string to reverse.

        Stack<char> stack = new Stack<char>(x);

        string s = new string(stack.ToArray());
Sandeep
  • 7,156
  • 12
  • 45
  • 57
0

Necromancing.
As a public service, this is how you actually CORRECTLY reverse a string
(reversing a string is NOT equal to reversing a sequence of chars)

public static class Test
{

    private static System.Collections.Generic.List<string> GraphemeClusters(string s)
    {
        System.Collections.Generic.List<string> ls = new System.Collections.Generic.List<string>();

        System.Globalization.TextElementEnumerator enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(s);
        while (enumerator.MoveNext())
        {
            ls.Add((string)enumerator.Current);
        }

        return ls;
    }


    // this 
    private static string ReverseGraphemeClusters(string s)
    {
        if(string.IsNullOrEmpty(s) || s.Length == 1)
             return s;

        System.Collections.Generic.List<string> ls = GraphemeClusters(s);
        ls.Reverse();

        return string.Join("", ls.ToArray());
    }

    public static void TestMe()
    {
        string s = "Les Mise\u0301rables";
        // s = "noël";
        string r = ReverseGraphemeClusters(s);

        // This would be wrong:
        // char[] a = s.ToCharArray();
        // System.Array.Reverse(a);
        // string r = new string(a);

        System.Console.WriteLine(r);
    }
}

See: https://vimeo.com/7403673

By the way, in Golang, the correct way is this:

package main

import (
  "unicode"
  "regexp"
)

func main() {
    str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
    println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme(str))
    println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme2(str))
}

func ReverseGrapheme(str string) string {

  buf := []rune("")
  checked := false
  index := 0
  ret := "" 

    for _, c := range str {

        if !unicode.Is(unicode.M, c) {

            if len(buf) > 0 {
                ret = string(buf) + ret
            }

            buf = buf[:0]
            buf = append(buf, c)

            if checked == false {
                checked = true
            }

        } else if checked == false {
            ret = string(append([]rune(""), c)) + ret
        } else {
            buf = append(buf, c)
        }

        index += 1
    }

    return string(buf) + ret
}

func ReverseGrapheme2(str string) string {
    re := regexp.MustCompile("\\PM\\pM*|.")
    slice := re.FindAllString(str, -1)
    length := len(slice)
    ret := ""

    for i := 0; i < length; i += 1 {
        ret += slice[length-1-i]
    }

    return ret
}

And the incorrect way is this (ToCharArray.Reverse):

func Reverse(s string) string {
    runes := []rune(s)
    for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
        runes[i], runes[j] = runes[j], runes[i]
    }
    return string(runes)
}

Note that you need to know the difference between
- a character and a glyph
- a byte (8 bit) and a codepoint/rune (32 bit)
- a codepoint and a GraphemeCluster [32+ bit] (aka Grapheme/Glyph)

Reference:

Character is an overloaded term than can mean many things.

A code point is the atomic unit of information. Text is a sequence of code points. Each code point is a number which is given meaning by the Unicode standard.

A grapheme is a sequence of one or more code points that are displayed as a single, graphical unit that a reader recognizes as a single element of the writing system. For example, both a and ä are graphemes, but they may consist of multiple code points (e.g. ä may be two code points, one for the base character a followed by one for the diaresis; but there's also an alternative, legacy, single code point representing this grapheme). Some code points are never part of any grapheme (e.g. the zero-width non-joiner, or directional overrides).

A glyph is an image, usually stored in a font (which is a collection of glyphs), used to represent graphemes or parts thereof. Fonts may compose multiple glyphs into a single representation, for example, if the above ä is a single code point, a font may chose to render that as two separate, spatially overlaid glyphs. For OTF, the font's GSUB and GPOS tables contain substitution and positioning information to make this work. A font may contain multiple alternative glyphs for the same grapheme, too.

Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
0
 static string reverseString(string text)
    {
        Char[] a = text.ToCharArray();
        string b = "";
        for (int q = a.Count() - 1; q >= 0; q--)
        {
            b = b + a[q].ToString();
        }
        return b;
    }