C array = array faster than memcpy()

Question

I have a piece of C code which I am trying to optimise which involves setting an array a to b. I am currently using memcpy to achieve this, and it works, however it's not fast enough. I.e.

double a[4] = {1.0, 2.0, 3.0, 4.0};
double b[4];
memcpy(b, a, sizeof(a));

This is a basic example, my program is similar but uses up to 9000 doubles. I know that the use of pointers can save a lot of time, but I'm not sure how to do it. You're help is greatly appreciated.

EDIT: I don't need to keep the a array, that can be discarded. I just need to transfer from a to b.

If you're copying the data, you're copying the data. I'd be surprised if it were easy to beat `memcpy` at that. — Daniel Fischer, May 21 '13 at 08:27
array object are evaluated like pointers in the memcpy function. Using pointers in memcpy will not make difference — MOHAMED, May 21 '13 at 08:28
@Nicholas Cooke I can not think of anything better than copying each element, if as the end result you need two copies of the data. — Kraken, May 21 '13 at 08:29
You should look at your design to see if all that copying is strictly necessary in the first place. — Paul R, May 21 '13 at 08:29
possible duplicate of [Very fast memcpy for image processing?](http://stackoverflow.com/questions/1715224/very-fast-memcpy-for-image-processing) — huon, May 21 '13 at 08:31
If `a` isn't required after the copy, but you need to transfer from `a` to `b`, why not use `a` instead of `b`? — jogojapan, May 21 '13 at 08:34
Thanks guys! I agree there has to be a better way of doing it in my program but I just wanted to check. I heard somewhere that if there was a pointer to the arrays I could set the pointer to array a equal to the pointer to array b and access the data through the pointers in which case no data has to be transferred, I just wasn't sure how to do it. — Nicholas Cooke, May 21 '13 at 08:37
@jogojapan letter in the program I use the values in b to determine new values for a. It goes through a while loop checking for a convergence in data. — Nicholas Cooke, May 21 '13 at 08:39
@JimBalter does this mean if I do something like b[0] = 1 it will change for both a and b or just b? And vice versa if I then change a value of a later will it be different in b? — Nicholas Cooke, May 21 '13 at 08:45
Yes, but I wrote that before you clarified. (Clarifying your needs helps a lot at SO.) See my answer below. — Jim Balter, May 21 '13 at 08:49
Nicholas, I've completely revamped my answer, having finally grasped, I think, what you're trying to do. Take a look, it may be just what you want. — Jim Balter, May 21 '13 at 08:58
@NicholasCooke A lot depends on how exactly the while loop operates. E.g. if it determines new values of `a[i]`, for `i` from 0 to N, and each computation of `a[i]` makes use of the old values of `a[j]` for `j=i`), then you could solve your problem by computing the new values from right to left (i.e. for `i` from N to 0, counting downwards). But that's just an example. My point is there may be ways to code the while loop such that no actual copy of `a` is required. — jogojapan, May 21 '13 at 09:04
@jogojapan unfortunately it relies on j < i and >= i. For this case that won't work, but thanks! — Nicholas Cooke, May 21 '13 at 09:14
@jogojapan That's clever, but actually there's no need to do a copy regardless of which way things go ... see my answer. — Jim Balter, May 21 '13 at 09:25
Yes, true -- or maybe not, depending on what exactly happens in the while loop. It may use some of the newly generated values and some of the old ones. Anyway, I just think it's important to discuss the actual algorithm rather than merely the optimization of the copy itself (as of course you also said earlier). — jogojapan, May 21 '13 at 09:33
@NicholasCooke This language doesn't have a concept of speed... It's your implementation that introduces speed: Your computer, your OS, your compiler and your library. Stop confusing implementation with specification. — autistic, May 21 '13 at 15:29
@undefinedbehaviour I was asking what is a better way to achieve the same functionality with memcpy but faster. By using a pointer swap as suggested in the accepted answer I was able to make my solution faster. I don't know where the confusion is. — Nicholas Cooke, May 22 '13 at 02:41
@NicholasCooke Do you suppose some implementations might be faster at speaking/reading languages than others? Why wouldn't it make sense to attach speed to the meaning of a language? — autistic, May 22 '13 at 02:55

score 8 · Answer 1 · edited Jun 20 '20 at 09:12

If memcpy() does not work, you are stuck. The memcpy() function, for large operands, is memory-bound so it is impossible^* to beat it. The only option is to redesign your program so it does not need to copy the arrays.

("Memory-bound" means that memcpy() is limited by the speed of your RAM or memory controller. Functions can be CPU-bound, memory-bound, IO-bound, etc.)

On most platforms, memcpy() is written in hand-tuned assembly language and heavily optimized to take advantage of various processor features (such as SSE). Trying to use multiple cores will not work because even if you spread the work across more cores, you are not spreading the work across more RAM or more memory controllers.

Footnotes

^* Some platforms or toolchains may have a poorly-optimized memcpy().

score 3 · Accepted Answer · edited May 23 '17 at 11:49

I use the values in b to determine new values for a. It goes through a while loop checking for a convergence in data.

In that case you may be able to avoid any copying, if you switch the arrays back and forth, something like this (which is backwards from what you wrote; adjust as needed):

double array1[SIZE], array2[SIZE];
double* a = array1, double* b = array2;
generate_initial_values(array1);

for (;;)
{
    // do either 
    memcpy(b, a, sizeof array1); // sizeof either array will do; *don't* use sizeof a or b, which is only the size of the pointer, not of the array
    update_values_in_b(b);

    // or, better:
    produce_modified_values_in_b_from_a(a, b);

    if (converged(a, b)) break;
    // switch arrays
    double* temp_ptr = a;
    a = b;
    b = temp_ptr;
}

Doing it the second way will be faster if that works for you. If you must memcpy, you can try the stuff in Very fast memcpy for image processing?, but probably the best for you is to use memcpy and set the compiler's optimization level as high as possible. Be sure that you #include <string.h> and that the size argument to memcpy is a compile-time constant (it is above), and look at the generated assembly code to verify that the compiler is inlining the copy.

Edit: Wait, here's another thought, that doesn't even require switching arrays:

double a[SIZE], b[SIZE];
generate_initial_values(a);

for (;;)
{
    produce_modified_values_in_second_array_from_first(a, b);
    if (converged(a, b)) break;
    produce_modified_values_in_second_array_from_first(b, a);
    if (converged(b, a)) break;
}

When you exit the loop you don't know which array has the latest values, but if they've converged you probably don't care. if you do, you can set a pointer to the latest values, or use a function:

void calling_function(void)
{
    ...
    double a[SIZE], b[SIZE];
    generate_initial_values(a);
    double* great_values = get_great_values(a, b); // returns either a or b
    ...
}

double* get_great_values(double* a1, double* a2)
{
    for (;;)
    {
        produce_modified_values_in_second_array_from_first(a1, a2);
        if (converged(a1, a2)) return a2;
        produce_modified_values_in_second_array_from_first(a2, a1);
        if (converged(a2, a1)) return a1;
    }
}

Hey Jim, thanks, this is what I was looking for. Sorry I didn't explain it in full to begin with :) — Nicholas Cooke, May 21 '13 at 09:13
@NicholasCooke I've made it even better ... you may not need any memcpy at all ... take a look. — Jim Balter, May 21 '13 at 09:21
@NicholasCooke Take another look ... I've improved on perfection, I think. — Jim Balter, May 21 '13 at 09:52
BTW, this is a classic example of an XY problem (google it). The problem wasn't really about either memcpy or pointers. — Jim Balter, May 21 '13 at 09:58

C array = array faster than memcpy()

2 Answers2

Footnotes