How to transform this memcpy into a for?

Question

What is the difference between these two?

for (int i = 0; i < numSamples; i++) {
    mData[sampleIndex++] = *buffer++ * (1.0f / 32768);
}

and

memcpy(&mData[sampleIndex], buffer, (numSamples * sizeof(float)));

If I understood correct, the first copies numSamples float values to mData, one by one. The second one, copies numSamples*sizeof(float) bytes to mData. Since we're copying numSaples * number of bytes on float, I think they do the same thing, but the first one actually multiplies things before passing to mData.

So, is there a way to transform the memcpy into a for? Something like:

for (int i = 0; i < numSamples * sizeof(float); i++) {
    //What to put here?
}

Context:

const int32_t   mChannelCount;
const int32_t   mMaxFrames;
int32_t sampleIndex;
float          *mData;
float *buffer;

The loop does multiplication and copy. `memcpy` only does copy. Remove multiplication from the loop and you get a hand-made `memcpy`. — Maxim Egorushkin, Jan 28 '21 at 21:04
`std::copy_n(buffer, numSamples, mData);` for the raw copy or `std::transform(buffer, buffer + numSamples, mData, [](auto value) { return value * 1.0f / 32768; });` for the transformation would be clearer i.m.o. — Ted Lyngmo, Jan 28 '21 at 21:11
*So, is there a way to transform the memcpy into a for* -- Why do you want to turn a very fast operation into one that isn't as fast? — PaulMcKenzie, Jan 28 '21 at 21:17
The `memcpy` doesn't perform the `1.0f / 32768` calculation. The `memcpy` function copies bytes; no other functionality is performed. — Thomas Matthews, Jan 28 '21 at 22:06

anastaciu · Answer 1 · 2021-01-28T22:03:59.260

1

I gather from your post that you want to make a memcpy similar copy but using a for loop, that being the case you just need do use the same for loop but without the multiplication part:

for (int i = 0; i < numSamples; i++){
    mData[sampleIndex++] = *buffer++;
}

Note that memcpy can be more effective than a for loop given the conditions (see Maxim Egorushkin and Jeremy Friesner comments bellow) so you may want to keep it that way.

Another, more idiomatic, and, I would argue, better way to implement the operations you are performing is to use the C++ library provided methods as sugested by Ted Lyngmo and rustyx.

Disclaimer: As I was writing my answer, Martin York posted a comment with a similar solution, that being the case, credit to him as well.

edited Jan 28 '21 at 22:03

answered Jan 28 '21 at 21:08

anastaciu

23,467
7
28
53

`memcpy` may not be _far more efficient_. Unlike `memcpy`, the compiler may know alignment and size of the data involved and vectorize the loop into a more efficient SIMD-copy with no `memcpy` prologue and epilogue. Same applies to `memset`. – Maxim Egorushkin Jan 28 '21 at 21:23
@MaximEgorushkin, I added a "usually" following your comment. – anastaciu Jan 28 '21 at 21:24
1

@MaximEgorushkin I get the impression that modern C++ compilers are often clever enough to take their knowledge of alignment and data-size and use it to generate calls to different specialized flavors of `memcpy()` based on which would be most efficient and still work. – Jeremy Friesner Jan 28 '21 at 21:26
The conditions when `memcpy` is more efficient is when the size and alignment aren't known, and the size is under a specific limit. After that size, plain `rep movsb` beats [`memcpy`](https://stackoverflow.com/questions/43343231/enhanced-rep-movsb-for-memcpy) on modern hardware. – Maxim Egorushkin Jan 28 '21 at 21:26
Sure, but does it beat `memcpy_aligned_using_sse()` (i.e. a hypothetical, less-general version of `memcpy()` that the compiler knows about and will call instead of the general-purpose `memcpy()` when conditions permit) ? – Jeremy Friesner Jan 28 '21 at 21:28
2

@JeremyFriesner You are right, modern compilers treat `memcpy` as _a copy is required, use the best method given all available information at the call site_. But they do so for loops as well. – Maxim Egorushkin Jan 28 '21 at 21:37
1

@MaximEgorushkin as a C guy I'm always compelled to use the 'mem's. The fact is that modern compilers are very capable when it comes to optimize less efficient code. Your discussion with Jeremy is a nice adition to the topic. – anastaciu Jan 28 '21 at 21:42
Fair enough. As a C++ guy I prefer assignment and initializer list to `memcpy`, and `whatever x = {};` to `memset` and expect the compiler to choose the best method for me. Those `size` arguments to `memcpy` and `memset` have been a source of bugs in 3rd-party code till this very day. Not to mention that `memcpy` and `memset` do not stop compiling when elements accidentally become no longer trivially copyable. – Maxim Egorushkin Jan 28 '21 at 21:46
@anastaciu C++ isn't C. `memcpy` isn't very idiomatic C++, as it's not type-safe. `std::copy_n` is. The end result (provided the absence of UB) will be identical (and if not, file a bug with Microsoft ;). – rustyx Jan 28 '21 at 21:47
@rustyx Or file a bug against another compiler: https://lemire.me/blog/2020/01/20/filling-large-arrays-with-zeroes-quickly-in-c/ – Maxim Egorushkin Jan 28 '21 at 21:53
@MaximEgorushkin, yes, old habits die hard, still, I would not have suggested it in a C++ setting wiht no context behind it, but given the fact the OP is using it, I went with it. The use of the `std::copy` methods would probably be the best way to go. – anastaciu Jan 28 '21 at 21:56
I do not disagree strongly, but the reality is when there is a possibility for mistake, that possibility realises one day (function of the square root of occurence). `size` argument calculation for `memcpy` and `memset` is a source of bugs in my experience, and by not using `memcpy` or `memset` I eliminate this class of bugs in my code completely. – Maxim Egorushkin Jan 28 '21 at 21:59
@rustyx, right you are. Still I would argue against Microsoft bug correction, what is a good Microsoft implementation without some UB? It would lose its charm :) – anastaciu Jan 28 '21 at 22:00

eerorika · Answer 2 · 2021-01-28T22:18:25.410

What is the difference between these two?

The former performs a calculation on the source array while copying the result into another array a float at a time.

The latter copies the content of the array byte at a time into another without calculation.

So, is there a way to transform the memcpy into a for?

Yes. Here is a naïve way to transform it:

auto dest_c = static_cast<unsigned char*>(mData + sampleIndex);
auto src_c = static_cast<const unsigned char*>(buffer);
auto end = src_c + numSamples * sizeof(float);
for (; src_c < end;) { // or while(src_c < end)
    *dest_c++ = *src_c++;
}

The actual implementation of the standard function is likely more complex, involving optimisations related to copying long sequences.

Since you don't appear to need the generic reinterpretation aspect of std::memcpy, perhaps a simpler alternative would suffice:

auto dest = mData + sampleIndex;
auto src = buffer;
auto end = src + numSamples;
for (; src < end;) {
    *dest++ = *src++;
}

Or perhaps another standard algorithm:

std::copy(buffer, buffer + numSamples, mData + sampleIndex);

I agree that `while(c)` can be dropped from the next version of the standard, `for(;c;)` does it. — Maxim Egorushkin, Jan 28 '21 at 22:08
@MaximEgorushkin Same number of characters -> equally optimal :) That said, I used an extra space. — eerorika, Jan 28 '21 at 22:10

chux - Reinstate Monica · Accepted Answer · 2021-01-28T22:34:33.130

What is the difference between these two?

for (int i = 0; i < numSamples; i++) {
  mData[sampleIndex++] = *buffer++ * (1.0f / 32768);
}
// and
memcpy(&mData[sampleIndex], buffer, (numSamples * sizeof(float)));

These are quite different given the * (1.0f / 32768);. I assume the code compare is setting the scaling difference aside. @Thomas Matthews.

Important: buffer, sampleIndex has different values after the for loop.
*buffer++ needs no code change should the type of buffer change. * sizeof(float) obilgies a code change. Could have used * sizeof *buffer.
mempcy() is optimized code per that platform. for() loops can only do so much. In particular, mempcy() assumes mData, buffer do not overlap. The for() loop may not be able to make that optimization.
This for uses int indexing where memcpy() uses size_t. Makes a difference with huge arrays.
memcpy() tolerates an unaligned pointers. mData[sampleIndex++] = *buffer++ .. does not.

"the first copies numSamples float values to mData, one by one. " is a not certain. A smart compiler may be able to make certain parallel copies depending on the context and act as if copying was done one by one.

Post the entire block of code/function that uses these 2 approaches for a better compare.

How to transform this memcpy into a for?

3 Answers3