-4

I wrote a small program to test the speed of two separate extension methods I created, detailed in the below screenshot:

public static class Extensions
{
    public static List<string> ParseEmails(this string[] emails, char[] charsToSplitOn)
    {
        List<string> list = new List<string>();
        foreach (string email in emails)
        {
            string[] splitArr = email.Replace(" ", "").Split(charsToSplitOn, StringSplitOptions.RemoveEmptyEntries);
            foreach (string item in splitArr)
            {
                list.Add(item);
            }
        }
        return list;
    }

    public static string[] ParseEmails2(this string[] emails, char[] charsToSplitOn)
    {
        string str = string.Empty;
        foreach (string item in emails)
        {
            str += item + ';';
        }
        return str.Replace(" ", "").Split(charsToSplitOn, StringSplitOptions.RemoveEmptyEntries);
    }
}

enter image description here

The Main method initializes a Stopwatch class to track the time each method takes to execute iterations amount of times.

The first method, ParseEmails has a for loop within another for loop, while the second method 'ParseEmails2' has only one for loop.

I would expect that then the second method, ParseEmails2, would be faster, as, from my understanding, this is done in O(n) time, whereas my first method, ParseEmails, is done in O(n^2) time.

If this is true, then shouldn't my results point to ParseEmails2 being the faster of the two methods?

Delfino
  • 967
  • 4
  • 21
  • 46
  • Code is now posted as code. – Delfino Nov 19 '18 at 18:41
  • 1
    You don't add timings and your methods have different return types. Comparing apples to oranges etc. You cannot assume library methods (Split, Replace) to be O(1) – H H Nov 19 '18 at 18:41
  • ParseEmails2 creates more `string` instances – Muhammad Azeez Nov 19 '18 at 18:43
  • 1
    `ParseEmails` is not O(n^2) since `n` would be the number of emails but the inner loop depends on the number of elements in the split array instead. `str += item + ';';` is a very inefficient way of appending to the end of a string. In general though, an O(n) algorithm can be slower than an O(n^2) algorithm depending on the size of the input and the associated constant factors. – Lee Nov 19 '18 at 18:46
  • Be aware that Big-O notation does not directly equate to time used or memory used, it only relates to the growth of those things. If processing the emails always takes 2 years, it's an O(1) algorithm. – Lasse V. Karlsen Nov 19 '18 at 18:55

4 Answers4

0

You have to be precise about what you mean by O(n^2). What is n? If n corresponds to the number of emails (in the emails array), then the first method runs on the scale of O(n), because you iterate over all emails once.

In the second method you use string concatenation, which means that you create a new string each run of the loop resulting in a complexity of O(n^2).

Prefer using StringBuilder or Lists over concatenating strings inside a loop.

areller
  • 4,800
  • 9
  • 29
  • 57
  • "both methods run on the scale of O(n), because in both methods you iterate once over all emails." That's only true if the body of the loop is an O(1) operation, which it is not. – Servy Nov 19 '18 at 18:51
  • @Servy The body of the loop is constant in terms of the number of emails, it depends on the size of individual emails. This is where you have to apply common sense and ask yourself whether you have to worry about high variance in size of different emails, if not, you can treat it as a constant. – areller Nov 19 '18 at 18:53
  • 1
    That statement is true for the first solution. The second solution's loop body's execution time *is* dependent on the number of emails. – Servy Nov 19 '18 at 18:55
0

Well...you don't seem to understand the purpose of the O-notation, O is less about the performance and more about the consistancy of performance. O(n²) means a parabola and O(n) a straight line, but it doesn't tell you about the actual time. So once you have many elements, the O(n) algorithm should be faster. A reason for that could be that you use string's += and + operators which use the copy on write-semantic of C# string.

chrissx
  • 13
  • 1
  • 5
-1

ParseEmails2 is not actually O(n), because

str += item + ';';

contains a loop.

Timbo
  • 27,472
  • 11
  • 50
  • 75
  • @HenkHolterman Does the compiler optimize away all the re-allocation and copying the ever growing string into new string objects? – Timbo Nov 19 '18 at 18:46
  • @Timbo it does create a new instance, but it's not a loop. It's equivalent of saying: `str = str + item + ';'` – Muhammad Azeez Nov 19 '18 at 18:47
  • 1
    @Encrypt0r But initializing the new instance with characters does iterate over all the characters of the the parts string is constructed from? – Timbo Nov 19 '18 at 18:49
  • @Timbo well, in that case, Replace and Split are also loops. I think concatenating two strings in c# should be considered as one operation. – Muhammad Azeez Nov 19 '18 at 18:58
  • @Encrypt0r The Replace and Split are only run once in ParseEmails2, so they don't at to big-O complexity. And the cost of the string concatenations (especially when str grows big) clearly depends on the size of the input data. In the last iteration, *all* email addresses are copied in that *one* operation. – Timbo Nov 19 '18 at 19:05
  • 1
    @Encrypt0r Yes, replacing, splitting, or concattenating strings are *all* operations that are O(n) where n is the size of the string(s) the operation is being performed on. The first loop only performs those operations on small strings, the individual strings being looped on (making the loop O(2*n), effectively), the second performs them on very large strings, specifically, the aggregate of all of the strings processed so far, making it O(n*(n -1)/2) or O(n^2), – Servy Nov 19 '18 at 19:16
  • @Servy I stand corrected, it seems that the blessings of C# have made my skills a bit dull – Muhammad Azeez Nov 19 '18 at 19:43
-1

you should check this link. string concatenation vs builder performance String concatenation vs String Builder. Performance

The string builder will most likely be marginally faster in this case, but really it's probably not going to be enough to fret about. The number of times you are re-creating the string (because strings are immutable and joining forces a new string to be created from the two existing ones).

Because of memory allocation on very step will cause more time.

Sameer
  • 509
  • 3
  • 16