1

This question drifted a bit afield due to the notion that move assignment methods should check for self assignment. Turns out this was not the case and had been the subject of this question So could we please get back to the question of whether there is something missing in code that otherwise seems faster than the default move assignment code.

I was testing move assignment code for performance when classes have constant or reference members which delete the default assignment when I found the coded version ran faster than expected. So then I tested it against the defaults but without constant or reference members. And to my surprise it ran faster than the default.

The performance test was sorting a vector of structs containing an int, vector, and string. The latter two needing dynamic memory with each struct object having variable vector and string lengths. Sorting structs is a common, expensive operation that shows the special move member methods effectiveness. This was the test struct member set:

struct A0 {
    int ci;
    std::vector<int> cv;
    std::string cs;
};

And here it is with the coded move assignment which runs about 10% faster:

struct A1 {
    int ci;
    std::vector<int> cv;
    std::string cs;
    A1() = default;
    A1(int i, const std::vector<int>& v, const std::string& s) : ci(i), cv(v), cs(s) {}
    A1(const A1&) = default;
    A1(A1&& arg) = default;
    A1& operator=(const A1& arg) = default;
    A1& operator=(A1&& arg) noexcept {
        std::destroy_at(this);
        std::construct_at(this, std::move(arg));
        return *this;
    }
};

Here's the full code that shows the population of a vector<Ax> with 1000 objects of Ax and the timing of the default move assignment v coded move assignment. Also compares performance of coded copy assignment and default with moves disabled. Huge improvement with moves of course.

While I tested it in MSVC, there is also a compiler explorer that shows similar results.

MSVC Results:

A_NM struct w/o moves: 0.0013056
A_NM_U user coded assign: 0.0031061
A0 Bare struct: 0.0001456
A0 Bare struct: 0.0001449
A0 Bare struct: 0.0001448
A1 Destroy/Construct move: 0.0001301
A1 Destroy/Construct move: 0.0001299
A1 Destroy/Construct move: 0.0001301

I don't understand why the defaults would be slower than the coded version so I'm afraid I'm somehow missing something in the move code.

So is the move code actually correct yet somehow faster?

#include <memory>
#include <iostream>
#include <vector>
#include <algorithm>
#include <type_traits>
#include <random>
#include <chrono>

struct Timer {
    std::chrono::system_clock::time_point snapTime;
    Timer() { snapTime = std::chrono::system_clock::now(); }
    operator double() { return std::chrono::duration<double>(std::chrono::system_clock::now() - snapTime).count(); }
};

template <typename T, size_t N = 1000>
std::vector<T> fill_vec()
{
    std::mt19937 rd{ 1 };
    std::vector<T> ret;
    ret.resize(N);
    std::generate(ret.begin(), ret.end(), [&]()
        {
            std::vector<int> v(5 + (rd() % 10));
            std::string s(32 + int(rd() % 10), '0');
            std::generate(v.begin(), v.end(), [&]() {return int(rd()); });
            std::generate(s.begin(), s.begin() + 10, [&]() {return char('0' + int(rd() % 10)); });
            return T{ int(rd()), v, s };
        });
    return ret;
}

// simple struct with implicit moves
struct A0 {
    int ci;
    std::vector<int> cv;
    std::string cs;
};

// Faster than default rule of 5!  Huh!
struct A1 {
    int ci;
    std::vector<int> cv;
    std::string cs;
    A1() = default;
    A1(int i, const std::vector<int>& v, const std::string& s) : ci(i), cv(v), cs(s) {}
    A1(const A1&) = default;
    A1(A1&& arg) = default;
    A1& operator=(const A1& arg) = default;
    A1& operator=(A1&& arg) noexcept {
        std::destroy_at(this);
        std::construct_at(this, std::move(arg));
        return *this;
    }
};


// Slower with moves deleted by declaring only c++98 ops
struct A_NM {
    int ci{};
    std::vector<int> cv{};
    std::string cs;
    A_NM(int i, const std::vector<int>& v, const std::string s) : ci(i), cv(v), cs(s) {}
    A_NM() = default;
    A_NM(const A_NM&) = default;
    A_NM& operator=(const A_NM& arg) = default;
};

// Slower than defaults with no moves
struct A_NM_U {
    int ci{};
    std::vector<int> cv{};
    std::string cs;
    A_NM_U(int i, const std::vector<int>& v, const std::string s) : ci(i), cv(v), cs(s) {}
    A_NM_U() = default;
    A_NM_U(const A_NM_U&) = default;
    A_NM_U& operator=(const A_NM_U& arg) {
        if (this==&arg)
            return *this;
        std::destroy_at(this);
        std::construct_at(this, arg);
        return *this;
    }
};

template <typename T>
void print(std::vector<T> arg, const std::string& descr)
{
    double mintime = 10000.;
    for (int i = 0; i < 5; i++)
    {
        auto v = arg;
        Timer timer;
        std::sort(v.begin(), v.end(), [](const T& a, const T& b) {return a.ci < b.ci; });
        double time = timer;
        if (mintime > time)
            mintime = time;
    }
    std::cout << descr << ": " << mintime << '\n';
}

int main()
{
    const int N = 1000;
    print(fill_vec<A_NM, N>(), "A_NM struct w/o moves");
    print(fill_vec<A_NM_U, N>(), "A_NM_U user coded assign");
    print(fill_vec<A0, N>(), "A0 Bare struct");
    print(fill_vec<A0, N>(), "A0 Bare struct");
    print(fill_vec<A0, N>(), "A0 Bare struct");
    print(fill_vec<A1, N>(), "A1 Destroy/Construct move");
    print(fill_vec<A1, N>(), "A1 Destroy/Construct move");
    print(fill_vec<A1, N>(), "A1 Destroy/Construct move");
}
doug
  • 3,840
  • 1
  • 14
  • 18
  • 7
    I don't know if it's the full story, but your version doesn't work for self-assignment. – Nelfeal Oct 24 '22 at 21:15
  • @Nelfeal Yeah. It doesn't. Doesn't apply to to the example though since it's only used on the abominably slow copy assignment. I'll add it. Not applicable to the move code. – doug Oct 24 '22 at 21:17
  • Self-assignment can also apply to rvalues. This is a bug in your "faster" code. – Drew Dormann Oct 24 '22 at 21:19
  • 1
    While it doesn't apply to your example, the default provided by the compiler is still going to make that check, as it is the correct thing to do, and that check comes at a cost. – NathanOliver Oct 24 '22 at 21:19
  • @NathanOliver There's lots of ways to break c++. Is prevention of `x=std::move(x)` going to occur in real life? If that test is actually occuring, why? – doug Oct 24 '22 at 21:24
  • 1
    Why _wouldn't_ it occur? That's valid code. Just like `x = x`, you tend to do a check to make sure you're not assigning to yourself... – ChrisMM Oct 24 '22 at 21:25
  • @ There's lots of semantically correct code that wreaks havoc. A performance penalty to prevent `x=std::move(x)` from just being a nop is beyond bizarre. – doug Oct 24 '22 at 21:27
  • 6
    Your measurements are strange. You sort the very same vector 5 times and choose the lowest measurement? Depending on the implementation, sorting a sorted vector might be almost a no-op. – Yksisarvinen Oct 24 '22 at 21:29
  • 3
    Sidenote: Don't use `system_clock` for measuring things like this. Use `steady_clock`. – Ted Lyngmo Oct 24 '22 at 21:32
  • @Yksisarvinen. Yes, it reduces the impact of OS delays from interrupt processing. You will see that if you just do a lot of runs without any repeats and the occasional high delay time. – doug Oct 24 '22 at 21:33
  • But that doesn't make sense. You are measuring time of 0 move sort (if vector is sorted, no swaps happen). Your measurement is the time of `sort` on sorted vector and likely doesn't depend on time of move of the structure. – Yksisarvinen Oct 24 '22 at 21:38
  • Added self assignment to code. No significant performance change. – doug Oct 24 '22 at 21:38
  • @Yksisarvinen Vector isn't sorted. Each, unsorted array is copied and then sorted for timing. – doug Oct 24 '22 at 21:40
  • Right, I missed that, sorry. Still, something irks me about this idea. Do you get the same results if you take the average time of each measurement insteaad? – Yksisarvinen Oct 24 '22 at 21:46
  • Or just add 5 times more elements and measure only one time? – Ted Lyngmo Oct 24 '22 at 21:50
  • Note that swapping the order of these measurements may also affect the result. – Ted Lyngmo Oct 24 '22 at 21:57
  • @Yksisarvinen. Yeah. I did the min things because is reduce the anomolous timing where occasionally a time would be 50% or more higher. Averaging might work wiht really large numbers of runs. Clearly OS issues. I first started off with multiple runs of just one pass. Pretty consistent results when tossing the anomolous. – doug Oct 24 '22 at 21:57
  • @NathanOliver Well isn't this interesting. Turns out both GCC and CLANG do not check self assignment in move assignment and nulls the result. See [this](https://godbolt.org/z/ocq3WEEc7) MSVC does and prevents the assignment.. – doug Oct 24 '22 at 22:50
  • OK people. Apparently move self assignment is not a thing. See [this](https://stackoverflow.com/questions/13127455/what-does-the-standard-library-guarantee-about-self-move-assignment) old question/answer. – doug Oct 25 '22 at 01:12

0 Answers0