0

I know that this sounds weird, but I don't wanna see a MSVCR120.dll in my program's IAT. This always sucks while running your program in new computer because they don't have this dll installed.

After some Googling I found #pragma intrinsic(memcpy) seems be designed for my problem, but it actually NOT.

Here is a small code for demonstration:

#pragma intrinsic(memcpy)
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <windows.h>

void main(int argc,char **argv)
{
    // simple cat implementation
    if (argc>1)
    {
        FILE *f = fopen(argv[1], "rb");
        fseek(f, 0, SEEK_END);
        DWORD size = ftell(f);
        fseek(f, 0, SEEK_SET);
        char *buf = (char*)malloc(size);
        fread_s(buf, size, 1, size, f);
        // below is nonsense , but for demonstration
        char *buf2 = new char[size+1];
        memcpy(buf2, buf, size); // this memcpy is **NOT** inlined!
        puts(buf2);
    }
}

Yes, I do know there is a memcpy function implemented in ntdll and I can use it via GetProcAddress, but now I wanna test why #pragma intrinsic(memcpy) does NOT work at all.

The code above will generate something likes this:

Assembly

This memcpy is actually a wrapper for calling real memcpy function in msvcr120.dll.

wrapper

And there is a memcpy in its IAT:

IAT

I'm pretty sure I have enabled the intrinsic function in my compiler's option:

option

Is there a solution for this? Thanks.

EDIT: I notice that there exist a msvcrt.dll in almost all version of windows. So is there a options that I can link msvcrt.dll instead of msvcr120.dll?

Ayra Faceless
  • 229
  • 1
  • 7
  • 1
    Did you try `/MT` switch? – Gaurav Sehgal Nov 09 '17 at 05:04
  • Yes the /MT switch will statically link the msvcrt120.lib , that makes my program much larger, I just wondering why the `#pragma intrinsic(memcpy)` doesn't work . – Ayra Faceless Nov 09 '17 at 05:26
  • You can write your own `memcpy`, but the compiler might recognize a loop to copy bytes and replace it with a `memcpy` call (which is a good optimization because a well-implemented `memcpy` can be faster than a simple loop to copy bytes). Compilers can even do this with other sequences of code—recognizing them and replacing them with standard or special library routines. If your compiler does this, what you really need is to check its documentation to see if there is a switch to tell it not to insert library calls. (Compilers may have such switches to support writing kernel code, for example.) – Eric Postpischil Nov 09 '17 at 05:36
  • Note: `puts(buf2);` attempts to print a potential non-string. Suggest allocating +1 and appending a null character. – chux - Reinstate Monica Nov 09 '17 at 05:39
  • I do not see a switch for this in the [Visual Studio documentation](https://msdn.microsoft.com/en-us/library/9s7c9wdw.aspx). There is a switch for generating code to run in the kernel, but it may do more than you want. One option might be to write your own versions of `memcpy` and any other routines you use from the library and link them in with your other modules so that the references are resolved by your versions, so the library is not needed. – Eric Postpischil Nov 09 '17 at 05:41
  • Thanks Eric , It is easy to implement a memcpy because you will find it's source in your own sdk. Just copy the code and rename the function to something else ,all works fine. It seems the only way to get rid of mscvrt120.dll is implementing everything in your own code. – Ayra Faceless Nov 09 '17 at 05:51
  • Thanks @chux . This may cause a memory leak. – Ayra Faceless Nov 09 '17 at 05:54
  • 2
    If it's about not needing MSVCR120.dll on the computer to run the program, what's wrong with linking statically? Of course the program will be larger, but you can't have the cake abd eat it. – Jabberwocky Nov 09 '17 at 06:12
  • Thanks @MichaelWalz. Because linking statically will also involves large mount of code that is not related with my program. – Ayra Faceless Nov 09 '17 at 06:17
  • 2
    Large .exe files are not really a problem nowadays. – Jabberwocky Nov 09 '17 at 06:25
  • Yeah It's not a big problem lol , maybe I am a perfectionist. @MichaelWalz – Ayra Faceless Nov 09 '17 at 06:36
  • 2
    Once you've solved memcpy, what about fopen, fseek? – M.M Nov 09 '17 at 06:50
  • @M.M Nope the code above is just an example. I don't use fopen in my real program. If I wanna do something likes this I would use CreateFile API instead. – Ayra Faceless Nov 09 '17 at 07:57
  • As for the inlining: When you call `memcpy(buf2, buf, size);` with 3 runtime values, the intrinsic can not do anything better than calling the library function. If `size` is a compile time constant, and the alignment of `buf` and `buf2` is known, you would get totally different code. [Check here](https://stackoverflow.com/a/11639305/597607) for an example where it ends up in a single register move. So using intrinsics can work, but just not in your example. – Bo Persson Nov 09 '17 at 13:35
  • @BoPersson well, Actually your compiler will try to optimize your code , so if size is a constant , there will be no memcpy library call even you don't add `#pragma intrinsic(memcpy)` in release mode. And calling memcpy with 3 runtime values , the intrinsic can do things better likes using assembly code `rep movsb` but it doesn't. – Ayra Faceless Nov 09 '17 at 13:52
  • @Ayra - But using `rep movsb` is very far from optimal on a processor that can move 8 or 16 bytes at a time (for aligned operands). The library function will do that. – Bo Persson Nov 09 '17 at 14:01
  • @BoPersson That's the point. I don't wanna involve the library. `rep movsb` is not really fast . maybe SSE2 instructions does better. – Ayra Faceless Nov 09 '17 at 14:07
  • @Ayra - The source for memcpy.asm comes with the compiler. It is 650 lines long. Nobody but perhaps you would like to have that expanded inline. So it's not. The intrinsic handles special cases with inline code and calls the library otherwise. Your case is not one of the "special" ones. – Bo Persson Nov 09 '17 at 14:19
  • @BoPersson Thanks for answering. Now i implements a custom memcpy in my code and works fine. Yeah i would like to have that 650 lines inlined. lol. – Ayra Faceless Nov 09 '17 at 14:33

1 Answers1

-1

There is another options. How about having your own memcpy function implemented as

void memcpy(void *dest, void *src, unsigned int length)
{
   unsigned char * destination = (unsigned char*)dest;
   for (int len = 0; len < length; len++)
          destination[len] = ((unsigned char*)src)[len];
}
Daksh Gupta
  • 7,554
  • 2
  • 25
  • 36
  • 1
    Actually the source of memcpy is in `%programfiles%\Microsoft Visual Studio 12.0\VC\crt\src\memcpy.c` And there exists some faster version of memcpy by using SSE2 instructions . – Ayra Faceless Nov 09 '17 at 06:03
  • 1
    Beside the speed, this is also wrong implementation. – unalignedmemoryaccess Nov 09 '17 at 07:22
  • @tilz0R please let one know what is wrong over here without proclaiming it unnecessarily. And about speed.. did you tested it ? – Daksh Gupta Nov 09 '17 at 07:43
  • 2
    @DakshGupta First,Does NOT check if the parameters are zero. Second, the source parameter should be CONST. Third , loop variable len should be unsigned as well. – Ayra Faceless Nov 09 '17 at 08:01
  • @AyraFaceless I think, this answer is less about mimicking `memcpy()` correctly, and more about emphasizing that an alternative can be used.Its upto you how you want to use it. – Gaurav Sehgal Nov 09 '17 at 08:04
  • @GauravSehgal Yeah, I know i can implement my version of memcpy and use it in my code. but I just wanna ask why `#pragma intrinsic(memcpy)` doesn't inline a memcpy in my program. – Ayra Faceless Nov 09 '17 at 08:09
  • @Ayra Faceless you're are good code reviewer. I wish we had more programmers in stackoverflow. Good luck – Daksh Gupta Nov 09 '17 at 08:21
  • @DakshGupta Thanks :) . I just copy code from CRT source and use it in my program since there isn't exist a better solution. – Ayra Faceless Nov 09 '17 at 08:27