10

I wrote this string extension awhile back, and I'm actually getting quite a bit of use out of it.

public static string Slice(this string str, int? start = null, int? end = null, int step = 1)
{
    if (step == 0) throw new ArgumentException("Step cannot be zero.", "step");

    if (start == null)
    {
        if (step > 0) start = 0;
        else start = str.Length - 1;
    }
    else if (start < 0)
    {
        if (start < -str.Length) start = 0;
        else start += str.Length;
    }
    else if (start > str.Length) start = str.Length;

    if (end == null)
    {
        if (step > 0) end = str.Length;
        else end = -1;
    }
    else if (end < 0)
    {
        if (end < -str.Length) end = 0;
        else end += str.Length;
    }
    else if (end > str.Length) end = str.Length;

    if (start == end || start < end && step < 0 || start > end && step > 0) return "";
    if (start < end && step == 1) return str.Substring((int)start, (int)(end - start));

    int length = (int)(((end - start) / (float)step) + 0.5f);
    var sb = new StringBuilder(length);
    for (int i = (int)start, j = 0; j < length; i += step, ++j)
        sb.Append(str[i]);
    return sb.ToString();
}

Since it's in all my projects now, I'm wondering if I could have done it better. More efficient, or would it produce unexpected results in any case?


Slice. It works like Python's array notation.

 "string"[start:end:step]

Many other languages have something like this too. string.Slice(1) is equivalent to string.Substring(1). string.Substring(1,-1) trims off the first and last character. string.Substring(null,null,-1) will reverse the string. string.Substring(step:2) will return a string with every other character... also similar to JS's slice but with an extra arg.


Re-revised based on your suggestions:

public static string Slice(this string str, int? start = null, int? end = null, int step = 1)
{
    if (step == 0) throw new ArgumentException("Step size cannot be zero.", "step");

    if (start == null) start = step > 0 ? 0 : str.Length - 1;
    else if (start < 0) start = start < -str.Length ? 0 : str.Length + start;
    else if (start > str.Length) start = str.Length;

    if (end == null) end = step > 0 ? str.Length : -1;
    else if (end < 0) end = end < -str.Length ? 0 : str.Length + end;
    else if (end > str.Length) end = str.Length;

    if (start == end || start < end && step < 0 || start > end && step > 0) return "";
    if (start < end && step == 1) return str.Substring(start.Value, end.Value - start.Value);

    var sb = new StringBuilder((int)Math.Ceiling((end - start).Value / (float)step));
    for (int i = start.Value; step > 0 && i < end || step < 0 && i > end; i += step)
        sb.Append(str[i]);
    return sb.ToString();
}
mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • 5
    What's it supposed to do? I know I could work it out, but I'm feeling a bit lazy... – ChrisF Nov 24 '10 at 19:54
  • I'm intrigued as to know what you use it for? The step bit is intriguing. I understand what it does, but what is the practical application. Just interested. – Tim Lloyd Nov 24 '10 at 19:56
  • 1
    too many if else. and the longest extension method i ve seen – DarthVader Nov 24 '10 at 19:57
  • 1
    It looks like you could've just used `Skip()` and `Take()` extension methods that already existed to accomplish the same task... – Quintin Robinson Nov 24 '10 at 19:57
  • so Slice(abcdef,null,null,2) will return ace ? – Fredou Nov 24 '10 at 19:57
  • @Quintin Robinson, one thing, anything will be faster than LINQ but might be really ugly to look at – Fredou Nov 24 '10 at 19:58
  • @Fredou I disagree with your assertion that anything will be faster than LINQ – Quintin Robinson Nov 24 '10 at 19:58
  • @Fredou: That's exactly correct. – mpen Nov 24 '10 at 20:00
  • @Quintin Robinson ,http://stackoverflow.com/questions/3769989/should-linq-be-avoided-because-its-slow – Fredou Nov 24 '10 at 20:04
  • @Fredou that is heavily context specific, the question does not provide evidence that LINQ itself is the bottleneck, especially given the breadth of what LINQ actually means. Anyone can write code that performs badly with many different conventions (I'm only saying that because the test provided is purely nonsense code.) – Quintin Robinson Nov 24 '10 at 20:07
  • @Quintin Robinson, you would need to show me at least one example that LINQ would beat, speed wise, a manual implementation(good one, not bad one) – Fredou Nov 24 '10 at 20:10
  • @Fredou No I'll let you discover on your own, if you wish. The onus is on you to provide evidence for your blanket assertion of "Anything will be faster than LINQ". – Quintin Robinson Nov 24 '10 at 20:13
  • @Quintin Robinson, you might want to read this too http://stackoverflow.com/questions/1185209/do-linq-queries-have-a-lot-of-overhead and http://stackoverflow.com/questions/1182922/what-is-the-efficiency-and-performance-of-linq-and-lambda-expression-in-net – Fredou Nov 24 '10 at 20:14
  • @Fredou Thanks for extra links, trust me I'm very well versed, that is why I have a contention with your statement. I hope I am not coming off as too argumentative. – Quintin Robinson Nov 24 '10 at 20:16
  • @Mark, sorry about this debate, but people always bring LINQ into answer but they never think about the impact, if it's heavily called :-) – Fredou Nov 24 '10 at 20:17
  • @Quintin Robinson, **took from one of my link**, The authors of LINQ in Action did some benchmarking with for, foreach, List.FindAll, and LINQ queries that all did the same thing. Depending on how the queries were constructed, LINQ was only about 10% slower. As they put it, LINQ does not come for free. – Fredou Nov 24 '10 at 20:21
  • @Fredou Also from one of your links.. "LINQ greatly aids expressiveness of code dealing with data... and it's not that hard to write code which performs well, **so long as you take the time to understand LINQ to start with.** If anyone told me not to use LINQ (especially LINQ to Objects) for perceived reasons of speed I would laugh in their face." – Quintin Robinson Nov 24 '10 at 20:23
  • @Quintin Robinson and that is a *Jon Skeet* quote... – jb. Nov 24 '10 at 20:38
  • @Quintin Robinson, I didn't said not to use LINQ. I said LINQ is always slower than a manual implementation, I do use LINQ a lot and I appreciate it, but I never saw one, only one example, where LINQ was faster :-) – Fredou Nov 24 '10 at 20:49
  • @Fredou Sounds implementation specific, sorry to hear you've run into the pitfalls as you have. Hopefully your future endeavors are more fruitful! =) – Quintin Robinson Nov 24 '10 at 20:53
  • @Quintin Robinson, look at my answer below, I put a LINQ solution, test it and tell me how you would speed it up because it is very slow – Fredou Nov 25 '10 at 04:10
  • Mark: Your function has a bug in it. See my answer (http://stackoverflow.com/questions/4270928/any-way-to-improve-this-string-slice-method/4273810#4273810) for one way to fix it. – Gabe Nov 25 '10 at 15:57

3 Answers3

2

If you have plenty of test cases, then detecting unexpected results shouldn't be an issue if you wish to experiment with different implementations.

From an API perspective I would consider optional arguments rather than nullable ints.

Update

After reading the code closely, I can see that giving "start" and "end" a value of null, has a special meaning when taking "step" into consideration, therefore, they could not be represented as optional int parameters alone, however, they could still be optional parameters.

After looking at the code more closely, it's a bit of a funky API as the values of individual parameters have an affect on each other. My previous comment alludes to this. You really have to know the implementation to work this out, not generally a good API aspect. And possibly makes for a difficult readability experience.

I can see how "step" can be used to reverse a string, which is potentially useful. But wouldn't a Reverse extension method be better for this? Much more readable and less of a mental speedbump.

Tim Lloyd
  • 37,954
  • 10
  • 100
  • 130
  • You mean I should overload the method a bunch of times then? Can't really do that as they're all ints... it won't know which is which. Works better with .net 4 where you can just go `string.Slice(end:-1)` to skip the first 2 args. – mpen Nov 24 '10 at 20:00
  • @Mark No optional arguments are a new C# 4.0 language feature. I have updated my answer with a link. – Tim Lloyd Nov 24 '10 at 20:02
  • @chibacity: I don't understand. How do you want me to make them optional if they can't be null? I need to give them a default value. 0 is a legal value, so I have to use something else. – mpen Nov 24 '10 at 20:05
  • Python does this. Try `"string"[::-1]`. You'll get "gnirts". The user doesn't need to think about what the start and end value will be. If they're omitted it just means "use the whole string" or "from start to finish". *Where* the start and finish are matters algorithmically, but shouldn't matter to the user of the API. – mpen Nov 24 '10 at 20:24
  • @Mark I think "bling.Reverse()" is much more readable, but that's just my opinion. :) – Tim Lloyd Nov 24 '10 at 20:27
  • @chibacity: Then they can use that. This is a versatile method. It doesn't have to be `-1`. It can be `-2`. Then you'd have to reverse it, and *then* take every other char. If you want to do that, then fine, but you might as well kill two birds with one stone, no? – mpen Nov 24 '10 at 20:40
  • @Mark I can see that it is flexible. I'm just wondering if being able to produce a reversed string of every other n characters is really that useful in practice. – Tim Lloyd Nov 24 '10 at 20:45
  • @chibacity: Probably not. 99% of the time I only use the first two args :) – mpen Nov 24 '10 at 20:58
1

I can see 3 things, very really minor one

change the inner if into ternary like

        if (start == null)
        {
            start = step > 0 ? 0 : str.Length - 1;
        }
        else if (start < 0)
        {
            start = start < -str.Length ? 0 : str.Length + start;
        }
        else if (start > str.Length) 
            start = str.Length; 

maybe change the (int)int? into int.Value

change

   var sb = new StringBuilder(length);

into

   StringBuilder sb = new StringBuilder(length);

and the big question is, if it does what it need, why fixing it?


update to show how to do it with LINQ, way slower (is there a way to speed it up?)

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Diagnostics;

    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                Stopwatch sw;
                string str;

                sw = Stopwatch.StartNew();
                for (int i = 0; i < 1000000; i++)
                    str = "Step cannot be zero.".Slice(null, null, -3, true);
                sw.Stop();
                Console.WriteLine("LINQ " + sw.Elapsed.TotalSeconds.ToString("0.#######") + " seconds");

                sw = Stopwatch.StartNew();
                for (int i = 0; i < 1000000; i++)
                    str = "Step cannot be zero.".Slice(null, null, -3, false);
                sw.Stop();
                Console.WriteLine("MANUAL " + sw.Elapsed.TotalSeconds.ToString("0.#######") + " seconds");

                Console.ReadLine();
            }
        }

       static class  test
        {
            public static string Slice(this string str, int? start, int? end, int step, bool linq)
            {
                if (step == 0) throw new ArgumentException("Step cannot be zero.", "step");

                if (linq)
                {

                    if (start == null) start = 0;
                    else if (start > str.Length) start = str.Length;

                    if (end == null) end = str.Length;
                    else if (end > str.Length) end = str.Length;

                    if (step < 0)
                    {
                        str = new string(str.Reverse().ToArray());
                        step = Math.Abs(step);
                    }
                }
                else
                {
                    if (start == null)
                    {
                        if (step > 0) start = 0;
                        else start = str.Length - 1;
                    }
                    else if (start < 0)
                    {
                        if (start < -str.Length) start = 0;
                        else start += str.Length;
                    }
                    else if (start > str.Length) start = str.Length;

                    if (end == null)
                    {
                        if (step > 0) end = str.Length;
                        else end = -1;
                    }
                    else if (end < 0)
                    {
                        if (end < -str.Length) end = 0;
                        else end += str.Length;
                    }
                    else if (end > str.Length) end = str.Length;


                }

                if (start == end || start < end && step < 0 || start > end && step > 0) return "";
                if (start < end && step == 1) return str.Substring(start.Value, end.Value - start.Value);

                if (linq)
                {
                    return new string(str.Skip(start.Value).Take(end.Value - start.Value).Where((s, index) => index % step == 0).ToArray ());;
                }
                else
                {
                    int length = (int)(((end.Value - start.Value) / (float)step) + 0.5f);
                    var sb = new StringBuilder(length);
                    for (int i = start.Value, j = 0; j < length; i += step, ++j)
                        sb.Append(str[i]);
                    return sb.ToString();
                }
            }

        }
    }
Fredou
  • 19,848
  • 10
  • 58
  • 113
  • 3
    The change from `var` to `StringBuilder` is purely cosmetic and a matter of personal choice or company coding standards. It has no effect on the efficiency (or otherwise) of the code. – ChrisF Nov 24 '10 at 21:06
  • Surprised I didn't see the opportunity for a ternary operator there... I'm usually all over those! Thanks :) Didn't know I could do `int?.Value`. – mpen Nov 24 '10 at 21:18
1

When I ask Python for "abcdefghijklmn"[::6] it returns 'agm', but when I ask your function for "abcdefghijklmn".Slice(step:6) it returns "ag".

I would recommend removing the incorrect length calculation and just performing your loop like this:

var sb = new StringBuilder((end - start).Value / step);
for (int i = start.Value; step > 0 && i < end || step < 0 && i > end; i += step)
    sb.Append(str[i]);
Gabe
  • 84,912
  • 12
  • 139
  • 238
  • Was worried there might still be a bug in that bit. Thank you!! – mpen Nov 25 '10 at 20:22
  • Hrm... 14/6 = 2.33. The +.5 was supposed to round it up to 3... maybe a `ceil` would have been more appropriate? – mpen Nov 25 '10 at 20:26