-1

I have a question about duplicating a 0-terminated string:

const char * str = "Hello World !";
size_t getSize = strlen(str);
char * temp = new char[getSize + 1];

... i know i can use this function

memcpy(temp, str, getSize);

but i want to use my own copy function which have action like this

int Count = 0;
while (str[Count] != '\0') {
    temp[Count] = str[Count];
    Count++;
}

both way's are true and success. now i want to check it on 10 milions times and for memcpy do this action

const char * str = "Hello World !";
size_t getSize = strlen(str);
for (size_t i = 0; i < 10000000; i++) {
    char * temp = new char[getSize + 1];
    memcpy(temp, str, getSize);
}

and this is for my own way

    const char * str = "Hello World !";
    size_t getSize = strlen(str);
    for (size_t i = 0; i < 10000000; i++) {
        char * temp = new char[getSize + 1];
        int Count = 0;
        while (str[Count] != '\0') {
            temp[Count] = str[Count];
            Count++;
        }
    }

first process done in 420 miliseconds and second done in 650 miliseconds ... why? both of those ways are same ! i want to use my own function not memcpy. is there any way to make my own way faster (fast as memcpy is fast or maybe faster)? how can i update my own way (while) to make it faster or equal with memcpy?

full source

int main() {

    const char * str = "Hello world !";
    size_t getSize = strlen(str);

    auto start_t = chrono::high_resolution_clock::now();
    for (size_t i = 0; i < 10000000; i++) {
        char * temp = new char[getSize + 1];
        memcpy(temp, str, getSize);
    }
    cout << chrono::duration_cast<chrono::milliseconds>(chrono::high_resolution_clock::now() - start_t).count() << " milliseconds\n";


    start_t = chrono::high_resolution_clock::now();
    for (size_t i = 0; i < 10000000; i++) {
        char * temp = new char[getSize + 1];
        int done = 0;
        while (str[done] != '\0') {
            temp[done] = str[done];
            done++;
        }
    }
    cout << chrono::duration_cast<chrono::milliseconds>(chrono::high_resolution_clock::now() - start_t).count() << " milliseconds\n";

    return 0;
}

results:

482 milliseconds
654 milliseconds

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
myOwnWays
  • 59
  • 4
  • 2
    How do you measure the execution time ? – Hannes Hauptmann Jul 16 '17 at 10:43
  • 3
    Relying on a `'\0'` character at the end of the array, doesn't do the same as `memcpy()` does. If you want to handle this case only you're probably better off with `strcpy()` than rolling your own function (there may be certain tricks used in the implementation that are make it even faster than your implementation). – user0042 Jul 16 '17 at 10:43
  • 3
    Why do you think you can outsmart the creators of your compiler's standard library? – PaulMcKenzie Jul 16 '17 at 10:43
  • Why do you want to use your own function if memcpy is faster? – melpomene Jul 16 '17 at 10:44
  • i don't want to use memcpy ... or strcpy ... i want to make my own action which is (while) until arrive to the end ... – myOwnWays Jul 16 '17 at 10:45
  • micro-ptimisation is a hell to nowhere. SIXSIGMA read up on that – Ed Heal Jul 16 '17 at 10:45
  • @AlirezaSaeedipour Yes, but why? – melpomene Jul 16 '17 at 10:46
  • 1
    @AlirezaSaeedipour *i want to make my own action* -- Then that is *your* homework you've made for yourself. As you can see, writing fast functions requires much more than knowing how to write a loop. – PaulMcKenzie Jul 16 '17 at 10:47
  • for my other program behaviors ...and also i want to know why ... memcpy is magic? im sure memcpy do this way to (check each char and copy one by one) ... – myOwnWays Jul 16 '17 at 10:48
  • 3
    I'm pretty sure that the cost of `new` dwarfs the cost of copying 14 bytes! – Richard Hodges Jul 16 '17 at 10:48
  • 2
    @myOwnWays Many compiler implementations use assembly language to copy buffers. Again, why do you think you can outsmart some of the best programmers in the industry? – PaulMcKenzie Jul 16 '17 at 10:49
  • 1
    You know, you must not drop the terminator. – Deduplicator Jul 16 '17 at 10:57

3 Answers3

4

Replacing library functions with your own often leads to inferior performance.

memcpy represents a very fundamental memory operation. Because of that, it is highly optimized by its authors. Unlike a "naïve" implementation, library version moves more than a single byte at a time whenever is possible, and uses hardware assistance on platforms where one is available.

Moreover, compiler itself "knows" about the inner workings of memcpy and other library functions, and it can optimize them out completely for cases when the length is known at compile time.

Note: Your implementation has semantics of strcpy, not memcpy.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
1

... both of those ways are same !

No, they aren't:

  1. memcpy() doesn't check each character to contain '\0' or not.
  2. There may be more optimizations done by the implementers than you have in your naive approach

It's unlikely that your approach can be made faster than memcpy().

user0042
  • 7,917
  • 3
  • 24
  • 39
0

Seeing you didn't use pointers and comparing what you are doing (strcpy) with memcpy clearly shows that you are a beginner and as already stated by everyone else, it is difficult to outsmart an experienced programmer like those that coded your library.

But I'm gonna give you some hints to optimize your code. I took a quick look at Microsoft's C Standard Library implementation (dubbed C Runtime Library) and they are doing it in assembly which is faster than doing it in C. So that is one point for speed.

In most 32-bit architecture with 32-bit buses, CPU can fetch 32 bits of information from memory in one request to memory (assuming that data is properly aligned), but even if you need 16 bits, or 8 bits, it still needs to make that 1 request. So working with your machine's word size probably gives you some speed up.

Lastly I want to direct your attention to SIMD. If your CPU provides it, you can use it and gain that extra speed. Again MSCRT has some SSE2 optimization options.

In the past from time to time, I had to write code that outperform my library implementation, because I had a specific need or a specific type of data that I could optimize for and while it might have some educational value unless specifically needed, your time is better spent on your actual code than to be spent on re-implementing your library functions.

m0h4mm4d
  • 400
  • 4
  • 12