0

As known read operation from misaligned address slow than read operation from aligned address. But what about write data to a misaligned address?

Based on the results of the following tests, the performance is approximately the same

#include <iostream>
#include <chrono>
#include <limits>

int main()
{
    char *number = new char[sizeof(unsigned int) + 1];
    ++number;//misaligned address
    unsigned int *n = reinterpret_cast<unsigned int*>(number);
    *n = 0;

    unsigned int divisor;
    std::cin >> divisor;
    auto start = std::chrono::steady_clock::now();
    for (unsigned int i = 0; i < std::numeric_limits<unsigned int>::max() / divisor; i++) {
        *n = i;
    }
    auto end = std::chrono::steady_clock::now();

    std::cout << *n << '\n';
    std::cout << (end - start).count() << '\n';
}

But i am not sure the test is correct and result of writing data to misaligned and aligned address is same on processor level. Are there any performance issues in this case?

Joe Joe
  • 121
  • 6
  • I would think it would depend on the hardware. – Galik Jul 01 '18 at 10:59
  • @Galik Firstly, thank you for answer. Sure, can you explain detailed? – Joe Joe Jul 01 '18 at 11:04
  • Did you verify that the loop doesn't get optimized out? You only use the last value, and the optimizer might be aware of that. Furthermore, IO is probably significant overhead compared to writing memory. You should move the output until after measuring the time. – eerorika Jul 01 '18 at 11:07
  • 2
    I just think it is likely that different `CPU`s and different *memory management units* would handle things differently. – Galik Jul 01 '18 at 11:08
  • Furthermore, I think this sort of benchmark should measure CPU time rather than Wall clock. (`chrono` has no clock for measuring Cpu time) – eerorika Jul 01 '18 at 11:17
  • 3
    `*n = 0;` is undefined behaviour. As the memory pointed to by `n` is not and has never been an `unsigned int`. Use placement `new` rather than `reinterpret_cast` to get a valid `unsigned int` pointer. – Richard Critten Jul 01 '18 at 11:26
  • 1
    On hardware: from Google: "A barrel shifter is a digital circuit that can shift a data word by a specified number of bits without the use of any sequential logic, only pure combinational logic. ... A barrel shifter is often used to shift and rotate n-bits in modern microprocessors, typically within a single clock cycle." Thus, some hardware handles misalignment quite nicely, if the compiler supports it and you enable it. See Example at "https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc.pdf" search for barrel-shifter. Probably not the only approach. – 2785528 Jul 01 '18 at 11:42
  • @user2079303 I changed the code so that the compiler could not apply the optimization (see edited post), but result still same – Joe Joe Jul 01 '18 at 11:46
  • @Richard Critten i dont think there is a undefined behaviour. I have sizeof(unsigned int) bytes in my program and i can use that bytes as i wish. Why not? Can you prove this by referring to the standard? – Joe Joe Jul 01 '18 at 11:48
  • It is undefined behaviour in C++ to access misaligned memory. – n. m. could be an AI Jul 01 '18 at 11:50
  • If you constantly write to the same address, you will have an extremely hot cache line. But does the values ever reach main memory? – Bo Persson Jul 01 '18 at 11:52
  • 1
    @JoeJoe read the __Type aliasing__ section here: https://en.cppreference.com/w/cpp/language/reinterpret_cast You can cast to pointer to `char` (or similar) and use the pointer but not from `char` (or similar), unless the original memory was of the type you are casting to (via a `char` pointer). – Richard Critten Jul 01 '18 at 11:54
  • @n.m. Any proof from standard? For example i have some struct which i need send to the other server. For this i need use sizeof(structname) to get size of sended data. But result is incorrect because there has alignment padding. Most compilers give options for disable alignment for this and same cases. If it is undefined behaviour, why most compilers has that ability? – Joe Joe Jul 01 '18 at 11:55
  • 1
    @JoeJoe compilers can offer extensions - you are now reliant on the compilers venders implementation and documentation. Standard C++ can not help you. `std::memcpy` is the safe (Standard compliant) way to get data into/out of such byte streams. – Richard Critten Jul 01 '18 at 11:56
  • @RichardCritten Before you say "read the Type aliasing section here". Sure, but what about -fno-strict-aliasing on GCC? In this case, I should not have problems with type aliasing, yes? – Joe Joe Jul 01 '18 at 12:08
  • On x86, unaligned read/write performace is almost the same as aligned. See: https://stackoverflow.com/questions/45128763/how-can-i-accurately-benchmark-unaligned-access-speed-on-x86-64 – geza Jul 01 '18 at 12:31
  • How about reading `[basic.align]`, the first sentence? – n. m. could be an AI Jul 01 '18 at 13:04
  • Not all CPUs support misaligned access. – Jesper Juhl Jul 01 '18 at 13:22
  • @n.m.I read C++17 standard [basic.align] but there's nothing about the fact that it causes undefined behavior. Can you copy-paste from standard what do you mean? – Joe Joe Jul 01 '18 at 15:26
  • @JesperJuhl Yes, but I'm interested in processors that can – Joe Joe Jul 01 '18 at 15:28
  • @RichardCritten Do you believe that a valid lvalue doesn't always refer to an object? – curiousguy Jul 02 '18 at 06:23

0 Answers0