1

I'm very new to c++, but I do have experience with other object oriented programming languages.

I'm attempting to alphabetically sort the lines in a file I have, which has roughly 5300 lines. I originally wrote the program in c++ for practice, but was curious to see how it would perform against my main language, c#.

To my surprise, the same sorting algorithm which takes my c++ function 18-20 seconds to execute, finishes in less than 3 seconds in c#.

Given that I am very new to c++ (and not a very experienced programmer in general), I am sure this must be an error in the way I wrote something.

With all that being said, I am aware that there are quicker sorting methods to use. However, both programs are using the same algorithm so I don't understand the reason for the large performance gap.

I will note that I have tried converting the data to an array instead of a vector, but sorting the array was only consistently about 3 seconds faster (about 15 seconds total instead of 18).

What am I doing wrong? Any/all help is appreciated!

Below is the c++:

void select_sort_alphabetical(std::vector<std::string> _vector)
{
    std::cout << "<< SORTING... >>" << "\n\n";

    int char_index, i, j, size = _vector.size(), loop_iterations = 0;
    char char1, char2;
    std::string temp;
    
    // Iterate through all lines
    for (i = 0; i < (size - 1); i++)
    {
        for (j = (1 + i); j < size; j++)
        {
            char_index = 0;

            char1 = _vector[i][char_index]; // Getting first character of each line
            char2 = _vector[j][char_index];
            
            // While the letters to be compared are the same, move onto the next character
            while (char1 == char2)
            {
                char_index++;
                char1 = _vector[i][char_index]; // Setting chars to the next characters in each line
                char2 = _vector[j][char_index];
            }
            // Once the characters are different - if line x.ascii_code greater than line x+1.ascii_code...
            if (_vector[i][char_index] > _vector[j][char_index]) // comparing characters
            {
                // Swapping places
                temp = _vector[i];
                _vector[i] = _vector[j];
                _vector[j] = temp;
            }
            loop_iterations++;
        }
    }

    //print_lines_from_vect(_vector);

    // Clearing contents of vector and freeing up memory (trying to, anyway)
    _vector.clear();
    _vector.shrink_to_fit();

    std::cout << "\nIterations: " << loop_iterations << "\n";
}

and here is the c#:

public static string[] select_sort_alphabetical(string[] lines, ref int loop_iterations)
        {
            Console.WriteLine("<< SORTING... >>");

            // Iterate through all lines
            for (int i = 0; i < (lines.Length - 2); i++)
            {
                for (int j = (1 + i); j < (lines.Length); j++)
                {
                    int char_index = 0;

                    char char1 = lines[i][char_index]; // Getting first character of each line
                    char char2 = lines[j][char_index];

                    // While the letters to be compared are the same, move onto the next character
                    while (char1 == char2)
                    {
                        char_index++;
                        char1 = lines[i][char_index];
                        char2 = lines[j][char_index];
                    }
                    // Once the characters are different - if line x.ascii_code greater than line x+1.ascii_code...
                    if (lines[i][char_index] > lines[j][char_index]) // comparing characters
                    {
                        // Swapping places
                        string temp = lines[i];
                        lines[i] = lines[j];
                        lines[j] = temp;
                    }
                    loop_iterations++;
                }
            }
            return lines;
        }
Jr_
  • 11
  • 1
  • 1
    What optimization settings are you using for that C++ program? (i.e. in VS are you using Debug or Release mode)? – Chuck Walbourn Mar 21 '22 at 05:05
  • 6
    Of course in C++ you'd never actually write that sort. You'd just use [std::sort](https://en.cppreference.com/w/cpp/algorithm/sort), but I get this is a learning example. – Chuck Walbourn Mar 21 '22 at 05:06
  • Haha yes I read about std::sort, and I am running in debug in both cases – Jr_ Mar 21 '22 at 05:08
  • Try comparing Release to Release – Chuck Walbourn Mar 21 '22 at 05:10
  • Wow! Finally the execution speed I was hoping to see! I did not realize running release mode made such a big difference, thank you so much! – Jr_ Mar 21 '22 at 05:18
  • 3
    If you're impressed now, wait until you write a sorting algorithm that's actually fast! Your algorithm performs more than 200 times slower than "standard" solutions (_e.g._ quick sort / merge sort) for this 5300-record data set. Note also you should use `std::swap` to avoid a copy when swapping your strings. – paddy Mar 21 '22 at 05:33
  • Haha for sure, I read about quick sort but it looked a little complex with the partitioning and everything. Thanks for the tip on std::swap, I will keep that in mind in the future! – Jr_ Mar 21 '22 at 05:42
  • 1
    Note that c# swap does not incur copy while c++ does. A closer equivalent in c++ would have been to use `std::vector _vector` – UmNyobe Mar 21 '22 at 06:15
  • 1
    debug mode has lots of things added for debugging purpose and the code is not optimized so obvious it's very slow in any language and must not be used for benchmarking or publishing to customers: [Running a program in Debug mode is incredible slow](https://stackoverflow.com/q/4591187/995714), [Why is this code running over 100 times slower in Debug mode?](https://stackoverflow.com/q/36514709/995714), [Why is Debug Mode slower than Release in VS?](https://stackoverflow.com/q/11564267/995714), [Why is this code 100 times slower in debug?](https://stackoverflow.com/q/12631609/995714)... – phuclv Mar 21 '22 at 06:51

1 Answers1

2

One reason your algorithm would be slower, without taking in account other differences between languages, is swapping lines:

// Swapping places
string temp = lines[i];
lines[i] = lines[j];
lines[j] = temp;
  • in c#, string temp = original means temp and original point to the same string. modifying each will reflect in the other.
  • in c++, string temp = original means temp is a new string. Modifying one will not modify the other.

C++ provides move class members, which allow new objects to "steal" the resources of the original object.

std:.swap, in order to swap objects, make something like

string temp = steal(lines[i]);
lines[i] = steal(lines[j]);
lines[j] = steal(temp);

This is a simplification of the real mechanism.

Btw, if you use swap, you will see far faster debug, because that line swapping is a big cost in your algorithm.

UmNyobe
  • 22,539
  • 9
  • 61
  • 90
  • 3
    I'm not sure what's the reason of using `steal` instead of `std::move` here? – apple apple Mar 21 '22 at 07:13
  • and `std::string` has it's own specialization so it probably not using a temp string variable anyway. – apple apple Mar 21 '22 at 07:16
  • 2
    @appleapple `std::string::swap` is more sophisitcated than just a tripple `std::move`, so representing it as such would give a false impression. Also `std::move` doesn't guarantee the the other instances move-assignment will actually steal the content, it only *allows* the receiving end to do so. – Ext3h Mar 21 '22 at 08:13
  • @Ext3h well isn't it what I said (specialization)? and yes `std::move` doesn't guarantee that but `steal` in this answer doesn't either (you'd only recognize this pattern if you imply it's `std::move` afaict). – apple apple Mar 21 '22 at 14:36
  • @appleapple I've used the term steal not because it exists as a function, but because this is the terminology used to describe move constructors\assigned. `Move constructors typically "steal" the resources held by the argument (e.g. pointers to dynamically-allocated objects, file descriptors, TCP sockets, I/O streams, running threads, etc.) rather than make copies of them, and leave the argument in some valid but otherwise indeterminate state` – UmNyobe Mar 24 '22 at 20:48