76

I was trying to figure out hands-on how tail calls are handled by the C# compiler.

(Answer: They're not. But the 64bit JIT(s) WILL do TCE (tail call elimination). Restrictions apply.)

So I wrote a small test using a recursive call that prints how many times it gets called before the StackOverflowException kills the process.

class Program
{
    static void Main(string[] args)
    {
        Rec();
    }

    static int sz = 0;
    static Random r = new Random();
    static void Rec()
    {
        sz++;

        //uncomment for faster, more imprecise runs
        //if (sz % 100 == 0)
        {
            //some code to keep this method from being inlined
            var zz = r.Next();  
            Console.Write("{0} Random: {1}\r", sz, zz);
        }
        
        //uncommenting this stops TCE from happening
        //else
        //{
        //    Console.Write("{0}\r", sz);
        //}

        Rec();
    }

Right on cue, the program ends with SO Exception on any of:

  • 'Optimize build' OFF (either Debug or Release)
  • Target: x86
  • Target: AnyCPU + "Prefer 32 bit" (this is new in VS 2012 and the first time I saw it. More here.)
  • Some seemingly innocuous branch in the code (see commented 'else' branch).

Conversely, using 'Optimize build' ON + (Target = x64 or AnyCPU with 'Prefer 32bit' OFF (on a 64bit CPU)), TCE happens and the counter keeps spinning up forever (ok, it arguably spins down each time its value overflows).

But I noticed a behaviour I can't explain in the StackOverflowException case: it never (?) happens at exactly the same stack depth. Here are the outputs of a few 32-bit runs, Release build:

51600 Random: 1778264579
Process is terminated due to StackOverflowException.

51599 Random: 1515673450
Process is terminated due to StackOverflowException.

51602 Random: 1567871768
Process is terminated due to StackOverflowException.

51535 Random: 2760045665
Process is terminated due to StackOverflowException.

And Debug build:

28641 Random: 4435795885
Process is terminated due to StackOverflowException.

28641 Random: 4873901326  //never say never
Process is terminated due to StackOverflowException.

28623 Random: 7255802746
Process is terminated due to StackOverflowException.

28669 Random: 1613806023
Process is terminated due to StackOverflowException.

The stack size is constant (defaults to 1 MB). The stack frames' sizes are constant.

So then, what can account for the (sometimes non-trivial) variation of stack depth when the StackOverflowException hits?

UPDATE

Hans Passant raises the issue of Console.WriteLine touching P/Invoke, interop and possibly non-deterministic locking.

So I simplified the code to this:

class Program
{
    static void Main(string[] args)
    {
        Rec();
    }
    static int sz = 0;
    static void Rec()
    {
        sz++;
        Rec();
    }
}

I ran it in Release/32bit/Optimization ON without a debugger. When the program crashes, I attach the debugger and check the value of the counter.

And it still isn't the same on several runs. (Or my test is flawed.)

UPDATE: Closure

As suggested by fejesjoco, I looked into ASLR (Address space layout randomization).

It's a security technique that makes it hard for buffer overflow attacks to find the precise location of (e.g.) specific system calls, by randomizing various things in the process address space, including the stack position and, apparently, its size.

The theory sounds good. Let's put it into practice!

In order to test this, I used a Microsoft tool specific for the task: EMET or The Enhanced Mitigation Experience Toolkit. It allows setting the ASLR flag (and a lot more) on a system- or process-level.
(There is also a system-wide, registry hacking alternative that I didn't try)

EMET GUI

In order to verify the effectiveness of the tool, I also discovered that Process Explorer duly reports the status of the ASLR flag in the 'Properties' page of the process. Never saw that until today :)

enter image description here

Theoretically, EMET can (re)set the ASLR flag for a single process. In practice, it didn't seem to change anything (see above image).

However, I disabled ASLR for the entire system and (one reboot later) I could finally verify that indeed, the SO exception now always happens at the same stack depth.

BONUS

ASLR-related, in older news: How Chrome got pwned

Cristian Diaconescu
  • 34,633
  • 32
  • 143
  • 233
  • 1
    I have edited your title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". – John Saunders Nov 27 '13 at 15:02
  • FYI: just tried without `Random` and only print `sz`. The same happens. – Julián Urbano Nov 27 '13 at 15:20
  • I wonder what's a technique to find out if the JIT has inlined a method call or not. – Cristian Diaconescu Nov 27 '13 at 15:37
  • 1
    @CristiDiaconescu Attach a debugger in visual studio after the JIT would have compiled the code (via the dropdown `Debug->Attach to process` or putting a `Debugger.Attach()` in your code) then go to the dropdown menu `Debug->Windows->Disassembly` to see the machine code that the JIT created. Remember that the JIT compiles code differently if you have a debugger attached or not, so be sure to start it without the debugger attached. – Scott Chamberlain Nov 27 '13 at 15:55
  • 14
    +1 For posting a question that's actually on topic for StackOverflow. Ridiculous how many people post questions that aren't about stack overflows at all! – Ben Lee Dec 03 '13 at 20:32
  • That has nothing to do with security. If you have a 47 bit userspace and 12 bits are static for a stack, that's still 35 bits, i.e. 34,3 billion placements, for the start page of a stack. And it doesn't help for SMT cacheline-aliasing either because caches have sufficient associativity today. Linux doesn't do such stupid things by default. – Bonita Montero May 06 '22 at 04:47

3 Answers3

53

I think it may be ASLR at work. You can turn off DEP to test this theory.

See here for a C# utility class to check memory information: https://stackoverflow.com/a/8716410/552139

By the way, with this tool, I found that the difference between the maximum and minimum stack size is around 2 KiB, which is half a page. That's weird.

Update: OK, now I know I'm right. I followed up on the half-page theory, and found this doc that examines the ASLR implementation in Windows: http://www.symantec.com/avcenter/reference/Address_Space_Layout_Randomization.pdf

Quote:

Once the stack has been placed, the initial stack pointer is further randomized by a random decremental amount. The initial offset is selected to be up to half a page (2,048 bytes)

And this is the answer to your question. ASLR takes away between 0 and 2048 bytes of your initial stack randomly.

Community
  • 1
  • 1
fejesjoco
  • 11,763
  • 3
  • 35
  • 65
0

This C++11 code prints the offset of the stack within the start page:

#include <Windows.h>
#include <iostream>

using namespace std;

#if !defined(__llvm__)
    #pragma warning(disable: 6387) // handle could be NULL
    #pragma warning(disable: 6001) // using uninitialized memory
#endif

int main()
{
    SYSTEM_INFO si;
    GetSystemInfo( &si );
    static atomic<size_t> aPageSize = si.dwPageSize;
    auto theThread = []( LPVOID ) -> DWORD
    {
        size_t pageSize = aPageSize.load( memory_order_relaxed );
        return (DWORD)(pageSize - ((size_t)&pageSize & pageSize - 1));
    };
    constexpr unsigned ROUNDS = 10;
    for( unsigned r = ROUNDS; r--; )
    {
        HANDLE hThread = CreateThread( nullptr, 0, theThread, nullptr, 0, nullptr );
        WaitForSingleObject( hThread, INFINITE );
        DWORD dwExit;
        GetExitCodeThread( hThread, &dwExit );
        CloseHandle( hThread );
        cout << dwExit << endl;
    }
}

Linux doesn't randomize the lower 12 bits per default:

#include <iostream>
#include <atomic>
#include <pthread.h>
#include <unistd.h>

using namespace std;

int main()
{
    static atomic<size_t> aPageSize = sysconf( _SC_PAGESIZE );
    auto theThread = []( void *threadParam ) -> void *
    {
        size_t pageSize = aPageSize.load( memory_order_relaxed );
        return (void *)(pageSize - ((size_t)&pageSize & pageSize - 1));
    };
    constexpr unsigned ROUNDS = 10;
    for( unsigned r = ROUNDS; r--; )
    {
        pthread_t pThread;
        pthread_create( &pThread, nullptr, theThread, nullptr );
        void *retVal;
        pthread_join( pThread, &retVal );
        cout << (size_t)retVal << endl;
    }
}

The issue here is that randomizing the thread stack's starting address within a page doesn't make sense from a security standpoint. The issue is simply that when you have a 64 bit system with 47 bit userspace (on newer Intel-CPUs you even have 55 bit userspace) you have still 35 bits to randomize, i.e. about 34 billion placements of a stack. And it doesn't make sense from a performance standpoint either since cacheline aliasing on SMT-systems can't happen because caches have enough associativity today.

Bonita Montero
  • 2,817
  • 9
  • 22
  • In what way do you think it doesn't make sense? There are plenty of attacks that need to know the exact offset from the stack to some other memory. Making these offsets less predictable is a win IMHO. – Jeremy Lakeman May 06 '22 at 05:44
  • With a 47 bit userspace there are still left 35 bits of randomization if you don't randomize the lower 12 bits. This makes attacks with arbitrary stack addresses impossible. – Bonita Montero May 07 '22 at 06:03
  • You're assuming that the entropy of a pointer to heap memory is the only security concern that will ever matter. Are you saying that making the stack alignment un-predictable will never have any value to mitigate an attack? Or are you concerned that any such value is outweighed by the cost of implementation? Are you certain that there are no hypothetical attacks that can be performed on stack memory, but require knowledge of the alignment? – Jeremy Lakeman May 09 '22 at 02:54
-3

Change r.Next() to r.Next(10). StackOverflowExceptions should occur in the same depth.

Generated strings should consume the same memory because they have the same size. r.Next(10).ToString().Length == 1 always. r.Next().ToString().Length is variable.

The same applies if you use r.Next(100, 1000)

Ahmed KRAIEM
  • 10,267
  • 4
  • 30
  • 33