Clearing memory securely and reallocations

Question

Following the discussion here, if you want to have a secure class for storing sensitive information (e.g passwords) on memory, you have to:

memset/clear the memory before freeing it
reallocations must also follow the same rule - instead of using realloc, use malloc to create a new memory region, copy the old to the new, and then memset/clear the old memory before freeing it finally

So this sounds good, and I created a test class to see if this works. So I made a simple test case where I keep adding the words "LOL" and "WUT", followed by a number to this secure buffer class around a thousand times, destroying that object, before finally doing something that causes a core dump.

Since the class is supposed to securely clear the memory before the destruction, I'm not supposed to be able to find a "LOLWUT" on the coredump. However, I managed to find them still, and wondered if my implementation is just buggy. However, I tried the same thing using CryptoPP library's SecByteBlock:

#include <cryptopp/osrng.h>
#include <cryptopp/dh.h>
#include <cryptopp/sha.h>
#include <cryptopp/aes.h>
#include <cryptopp/modes.h>
#include <cryptopp/filters.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
using namespace std;

int main(){
   {
      CryptoPP::SecByteBlock moo;

      int i;
      for(i = 0; i < 234; i++){
         moo += (CryptoPP::SecByteBlock((byte*)"LOL", 3));
         moo += (CryptoPP::SecByteBlock((byte*)"WUT", 3));

         char buffer[33];
         sprintf(buffer, "%d", i);
         string thenumber (buffer);

         moo += (CryptoPP::SecByteBlock((byte*)thenumber.c_str(), thenumber.size()));
      }

      moo.CleanNew(0);

   }

   sleep(1);

   *((int*)NULL) = 1;

   return 0;
}

And then compile using:

g++ clearer.cpp -lcryptopp -O0

And then enable core dump

ulimit -c 99999999

But then, enabling core dump and running it

./a.out ; grep LOLWUT core ; echo hello

gives the following output

Segmentation fault (core dumped)
Binary file core matches
hello

What is causing this? Did the whole memory region for the application realloc itself, because of the reallocation caused by SecByteBlock's append?

Also, This is SecByteBlock's Documentation

edit: After checking the core dump using vim, I got this: https://i.stack.imgur.com/3iXb3.jpg

edit2: updated code so it's more readily compilable, and compilation instructions

final edit3: It looks like memcpy is the culprit. See Rasmus' mymemcpy implementation on his answer below.

I don't think it's the reason for what you're seeing, but you do know that calling `memset` on some memory doesn't prevent there still being a copy of it in some swap file somewhere? And generally, the result of the `memset` need not penetrate all layers of cache, but I mention swap file because it's the most persistent and hence the most blatantly dangerous place for sensitive data. — Steve Jessop, May 21 '12 at 11:02
@SteveJessop hmm, that's probably why windows has SecureZeroMemory or something like that. I wonder what the linux/posix equivalent would be.. if there is. — kamziro, May 21 '12 at 11:10
A sufficiently aggressive optimizer might remove `memset` to blocks that are never read again. `SecureZeroMemory` is there for a reason! — Bo Persson, May 21 '12 at 15:14
My kingdom for a posix/linux/iOS version of securezeromemory then! — kamziro, May 21 '12 at 15:36
@kamziro: I'm pretty sure that Linux and iOS will both always clear your memory for you before letting another process see it. If not then it's because you've switched something off when compiling the kernel :-) Ultimately what you're fighting over is the duration of the window within which an attacker could see the contents of the memory by provoking and reading a core dump, directly inspecting RAM, etc. The window either exists or doesn't no matter what you do, since there's normally no reason an attacker could do that stuff *after* you clear it but not before. — Steve Jessop, May 21 '12 at 17:55
Which version of gcc and cryptopp are you using? Which platform? Which optimization flags? I cannot reproduce the problem on my local Linux box (gcc-4.5.3, cryptopp-5.6.1, x86, -O2). — Rasmus Faber, May 21 '12 at 18:21
@RasmusFaber ubuntu 10.04.2 LTS, GCC 4.6.2, cryptopp-5.6.1, -O0, x86 -- and I'm gonna update the testing code to make that __toString a standard one instead — kamziro, May 22 '12 at 02:20
@SteveJessop good to know that they'll be cleared before other processes can see it, but I'm not worried about other processes. I'm worried about someone cryogenically freezing the RAM and then reading the whole memory, as they apparently did (googled cryogenically frozen rams). That, or DMA hacks. — kamziro, May 22 '12 at 02:21
@kamziro: yeah, I think the fact that firewire is one gigantic DMA hack is a bigger practical risk than dunking RAM in liquid nitrogen. So your program is going to delete the memory, then tell the user "it is now safe to leave your PC next to some shifty-looking dude clutching a vat of coolant", or words to that effect? Fair enough, as far as it goes :-) — Steve Jessop, May 22 '12 at 02:25
@SteveJessop ha, fair enough, but it's more likely that someone could steal my iOS device, take a 2 minute drive into their little criminal laboratory and put the device on a coolant. And if they can do that, the DMA hack is more practical. Besides, I think law enforcement agencies already have specialized devices to do this kind of DMA forensics thing. — kamziro, May 22 '12 at 02:27
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/11545/discussion-between-kamziro-and-steve-jessop) — kamziro, May 22 '12 at 02:42

andrewdotn · Answer 1 · 2013-01-29T22:30:19.937

Despite showing up in the coredump, the password isn’t actually in memory anymore after clearing the buffers. The problem is that memcpying a sufficiently long string leaks the password into SSE registers, and those are what show up in the coredump.

When the size argument to memcpy is greater than a certain threshold—80 bytes on the mac—then SSE instructions are used to do the memory copying. These instructions are faster because they can copy 16 bytes at a time in parallel instead of going character-by-character, byte-by-byte, or word-by-word. Here’s the key part of the source code from Libc on the mac:

LAlignedLoop:               // loop over 64-byte chunks
    movdqa  (%rsi,%rcx),%xmm0
    movdqa  16(%rsi,%rcx),%xmm1
    movdqa  32(%rsi,%rcx),%xmm2
    movdqa  48(%rsi,%rcx),%xmm3

    movdqa  %xmm0,(%rdi,%rcx)
    movdqa  %xmm1,16(%rdi,%rcx)
    movdqa  %xmm2,32(%rdi,%rcx)
    movdqa  %xmm3,48(%rdi,%rcx)

    addq    $64,%rcx
    jnz     LAlignedLoop

    jmp     LShort                  // copy remaining 0..63 bytes and done

%rcx is the loop index register, %rsi is the source address register, and %rdi is the destination address register. Each run around the loop, 64 bytes are copied from the source buffer to the 4 16-byte SSE registers xmm{0,1,2,3}; then the values in those registers are copied to the destination buffer.

There’s a lot more stuff in that source file to make sure that copies occur only on aligned addresses, to fill in the part of the copy that’s leftover after doing 64-byte chunks, and to handle the case where source and destination overlap.

However—the SSE registers are not cleared after use! That means 64 bytes of the buffer that was copied is still present in the xmm{0,1,2,3} registers.

Here’s a modification of Rasmus’s program that shows this:

#include <ctype.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <emmintrin.h>

inline void SecureWipeBuffer(char* buf, size_t n){
  volatile char* p = buf;
  asm volatile("rep stosb" : "+c"(n), "+D"(p) : "a"(0) : "memory");
}

int main(){
  const size_t size1 = 200;
  const size_t size2 = 400;

  char* b = new char[size1];
  for(int j=0;j<size1-10;j+=10){
    memcpy(b+j, "LOL", 3);
    memcpy(b+j+3, "WUT", 3);
    sprintf((char*) (b+j+6), "%d", j);
  }
  char* nb = new char[size2];
  memcpy(nb, b, size1);
  SecureWipeBuffer(b,size1);
  SecureWipeBuffer(nb,size2);

  /* Password is now in SSE registers used by memcpy() */
  union {
    __m128i a[4];
    char c;
  };
  asm ("MOVDQA %%xmm0, %0": "=x"(a[0]));
  asm ("MOVDQA %%xmm1, %0": "=x"(a[1]));
  asm ("MOVDQA %%xmm2, %0": "=x"(a[2]));
  asm ("MOVDQA %%xmm3, %0": "=x"(a[3]));
  for (int i = 0; i < 64; i++) {
      char p = *(&c + i);
      if (isprint(p)) {
        putchar(p);
      } else {
          printf("\\%x", p);
      }
  }
  putchar('\n');

  return 0;
}

On my mac, this prints:

0\0LOLWUT130\0LOLWUT140\0LOLWUT150\0LOLWUT160\0LOLWUT170\0LOLWUT180\0\0\0

Now, examining the core dump, the password only occurs one single time, and as that exact 0\0LOLWUT130\0...180\0\0\0 string. The core dump has to contain a copy of all registers, which is why that string is there—it’s the values of the xmm{0,1,2,4} registers.

So the password isn’t actually in RAM anymore after calling SecureWipeBuffer, it only appears to be because it is actually in some registers that only appear in the coredump. If you’re worried about memcpy having a vulnerability that could be exploited by RAM-freezing, worry no more. If having a copy of the password in registers bothers you, use a modified memcpy that doesn’t use the SSE2 registers, or clears them when it’s done. And if you’re really paranoid about this, keep testing your coredumps to make sure the compiler isn’t optimizing away your password-clearing code.

I'm facing a similar problem when dealing with secure/trusted environments. I was aware that some optimizations _can_ indeed wipe away the memset. For what it's worth, here's another solution from the Intel SGX SDK that uses a volatile pointer to a custom memset function: https://github.com/intel/linux-sgx/blob/56faf11a455b06fedd6adc3e60b71f6faf05dc0f/common/src/sgx_memset_s.cpp — X99, Jun 09 '21 at 14:25

Rasmus Faber · Accepted Answer · 2012-05-22T08:31:56.157

10

Here is another program that reproduces the problem more directly:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

inline void SecureWipeBuffer(char* buf, size_t n){
  volatile char* p = buf;
  asm volatile("rep stosb" : "+c"(n), "+D"(p) : "a"(0) : "memory");
}

void mymemcpy(char* b, const char* a, size_t n){
  char* s1 = b;
  const char* s2= a;
  for(; 0<n; --n) *s1++ = *s2++;
}

int main(){
  const size_t size1 = 200;
  const size_t size2 = 400;

  char* b = new char[size1];
  for(int j=0;j<size1-10;j+=10){
    memcpy(b+j, "LOL", 3);
    memcpy(b+j+3, "WUT", 3);
    sprintf((char*) (b+j+6), "%d", j);
  }
  char* nb = new char[size2];
  memcpy(nb, b, size1);
  //mymemcpy(nb, b, size1);
  SecureWipeBuffer(b,size1);
  SecureWipeBuffer(nb,size2);

  *((int*)NULL) = 1;

  return 0;    
}

If you replace memcpy with mymemcpy or use smaller sizes the problem goes away, so my best guess is that the builtin memcpy does something that leaves part of the copied data in memory.

I guess this just shows that clearing sensitive data from memory is practically impossible unless it is designed into the entire system from scratch.

edited May 22 '12 at 08:31

answered May 22 '12 at 07:53

Rasmus Faber

48,631
24
141
189

Curses, that's unfortunate. I guess that rules out marketing my app with "Secure against RAM forensics!" – kamziro May 22 '12 at 08:07
Actually, your mymemcpy seems to have solved the problem. It seems like the standard really was the thing causing the dispersion of the memory into the wild! – kamziro May 22 '12 at 08:45
@kamziro. But strangely enough, using -fno-builtin-memcpy does not work. – Rasmus Faber May 22 '12 at 08:51
2

@kamziro: I have created [a ticket](https://sourceforge.net/apps/trac/cryptopp/ticket/18) in Crypto++'s bugtracker for good measure. – Rasmus Faber May 22 '12 at 08:56

score 2 · Answer 3 · edited May 23 '17 at 11:46

2

The string literals will be stored in memory and not managed by the SecByteBlock class.

This other SO question does a decent job of explaining it: Is a string literal in c++ created in static memory?

You can try and confirm whether the grep matches can be accounted for by the string literals by seeing how many matches you get. You could also print out the memory locations of the SecByteBlock buffers and try to see if they correspond with the locations in the core dump that match your marker.

edited May 23 '17 at 11:46

Community

1
1

answered May 22 '12 at 02:43

kennbrodhagen

4,258
2
26
21

The core dump memory match looks something like this: "@^@^@^@^@^@^@^@^@^@^@^@^@0LOLWUT231LOLWUTOLWUT229LOLWUT23216LOLWUT217LOLWUT218LOLWUT219LOLWUT220LOLWUT221LOLWUT222LOLWUT223LOLWUT224LOLWUT225LOLWUT226LOL^@^@^@^@^@^@^@^@^" (from vim), so they are definitely from the secbyteblock. Setting the loop to zero elements gives no matches. The setting to zero is also done by using user input, so it's definitely not "compiler optimized out" either – kamziro May 22 '12 at 02:48

score 2 · Answer 4 · answered Jan 29 '13 at 00:14

Without inspecting the details of memcpy_s, I suspect that what you are seeing is a temporary stack buffer used by memcpy_s to copy out small memory buffers. You can verify this by running in a debugger and seeing if LOLWUT shows up when viewing stack memory.

[The implementation of reallocate in Crypto++ uses memcpy_s when resizing memory allocations, which is why you would be able to find some number of LOLWUT strings in memory. Also, the fact that many different LOLWUT strings overlap in that dump suggest that it's a temporary buffer that's being reused.]

The custom version of memcpy that is just a simple loop does not require temporary storage beyond counters, so that would certainly be more secure than how memcpy_s is implemented.

+1. memset_s is the only spec conforming way to clear buffers securely. It also implies that prior to C++11, there's no way to clear buffers securely (i.e. while conforming to spec). — ishaq, Aug 10 '18 at 18:28

score 0 · Answer 5 · answered Jan 29 '13 at 14:12

0

I would suggest that the way to do it is to encrypt the data in memory. In that way, the data is always secure whether it is still in memory or not. The drawback, of course, is an overhead in terms of encrypting/decrypting the data each time it is accessed.

answered Jan 29 '13 at 14:12

Jack Aidley

19,439
7
43
70

The compiler "smart" enough to optimize out `memset` on memory you are about to `free`, might be able to remove the last encryption as well. Or the next compiler version might. – Bo Persson Jan 29 '13 at 15:44
I'm suggesting you only encrypt/decrypt the specific section you're currently reading/writing. – Jack Aidley Jan 29 '13 at 15:50

Clearing memory securely and reallocations

5 Answers5

Linked