How to capture the time duration inside an enclave?

Question

Measuring run time is a significant aspect of perfromance evaluation. Now I want to evaluate the performance of several codes inside the SGX enclave (the trusted execution environment), and I noticed the Intel SGX provides an API called 'sgx_get_trusted_time()' for developers to obtain current time from a trusted source. However, I got trouble here:

1) The 'sgx_create_pse_session()' is required before using 'sgx_get_trusted_time()', but I always got an error 'SGX_ERROR_SERVICE_UNAVAILABLE'.I have installed and configured the SGX SDK and PSW correctly (otherwise I cannot use the Remote Attestation Service). I also tried to update Management Engine for the development platform (Win10 + ThinkPad x270 + CORE i5), but it didn' work;

2) The API returns the time in seconds, which is far from precise in terms of performance evaluation, especially when the time passed between two API calls is trival.

How can I fix the first problem, and any solution to measure the time passed insided the enclave more precisely? Appreciate for any suggestion or hint.

Can you record start/stop timestamps before/after code that calls into the SGX enclave? Code outside the enclave should be able to get precise wall-clock time for one call, including all SGX overhead. (I'm here for the intel/x86 tags; I don't know if that makes sense for how SGX works. But if calls into the enclave are synchronous like making a system call is, you can time how long it takes to return.) — Peter Cordes, May 03 '20 at 08:55
Thanks for the answer, but I'm afraid it cannot resolve my problem since I need to dive into the enclave and record the run time of particular operations. Reading trusted time from TPM seems to be a possible solution (I haven't try yet). — tuziYou, May 04 '20 at 01:14
Does the `rdtsc` instruction work inside an enclave? If so, you could record that inside the enclave and put it somewhere you can see from outside. [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/q/13772567). It's about as fine-grained as you can get on x86 (only `rdpmc` for hardware performance counters is lower overhead), but it still has overhead of dozens of clock cycles. And the TSC ticks at constant frequency so you need to control for turbo / idle clocks. See my answer on that question. — Peter Cordes, May 04 '20 at 01:20
Thanks again. I tried to use instruction you recommanded, but it seems not allowed to be invoked inside the enclave due to strict limitation of SGX. Is it an instruction belongs to the standard C library? I ask this question since the intel sgx only allow the developers to use standard C library and the specific libraries which are revisoned by intel carefully (e.g. sgx-ssl, which is sgx-enabled openssl library) inside the enclave. — tuziYou, May 06 '20 at 01:56
*Is it an instruction belongs to the standard C library?* That question doesn't even make sense. It's an assembly language / machine-code instruction (https://www.felixcloutier.com/x86/rdtsc), like `add eax, [rdi]`, not a function call to other code. However, it is microcoded so it could be handled specially depending on mode. (It is or can be for virtual machines). https://software.intel.com/en-us/forums/intel-software-guard-extensions-intel-sgx/topic/743186 says *RDTSC and RDTSCP are legal inside an enclave for processors that support SGX2 (subject to the value of CR4.TSD).* — Peter Cordes, May 06 '20 at 03:59
Thanks there. I'm novice at assembly language so it is a little difficult for me to bridge developing SGX application (which is always done using C/C++) and assembly language. I two more questions here: 1、does using rdtsc mean I need to develop a SGX application using assembly language? 2、the processor is reuired to support SGX2, without which the RDTSC/RDTSCP cannot be used, right? In addition, the instruction seems cannot provide a trusted time I guess it would be better to turn to the Intel developer zone for further help. Thanks for your suggestions!!! — tuziYou, May 07 '20 at 09:30
You can think of C++ compilers as a convenient way to generate machine code, since I *do* actually know how assembly works. The CPU is always actually running machine code, regardless of whether the compiler generated it from plain C++ like `int foo = a + b;`, from intrinsics like `long foo = __rdtsc();`, or from inline assembly. So no, you don't need to write the program in asm! 2. Yes, in the end it's just a matter of which CPU instructions are allowed when executing in SGX mode. Apparently that doesn't always include `rdtsc`. — Peter Cordes, May 07 '20 at 09:32
As far as whether you can trust the TSC: you're just doing this for benchmarking on your dev machine, right? If you aren't doing anything weird, `__rdtsc()` will always increment by 1 per "reference cycle". The absolute value starts at `0` on CPU reset, if nothing has modified it since then. Intel/AMD HW virtualization can scale and offset the guest TSC, and I think the kernel can write the TSC to reset it via an MSR (model-specific register). But that's not a possible attack vector on your real code if you remove the rdtsc benchmarking code from the real application. — Peter Cordes, May 07 '20 at 09:41
Thanks again. So the instruction rdtsc is included by the complier (assume the processor support SGX2)? If yes, how should I use it to record the execution time of codes, in other words, what would the rdtsc benchmarking code be like? Code or pseudo-code are welcome. — tuziYou, May 07 '20 at 09:55
Same as you'd use for any clocksource: `uint64_t start = __rdtsc();` / do stuff / `uint64_t duration = __rdtsc() - start;`. See [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/q/13772567) for more, including some caveats about it in my answer. If "stuff" is very short, you might want to make sure execution of earlier instructions has finished by using `_mm_lfence(); duration = __rdtsc() - start;`, otherwise out-of-order exec could run rdtsc before the work you're timing is finished. The time will be in "reference cycles", not absolute nanoseconds. — Peter Cordes, May 07 '20 at 10:03
Thank you so much for such detailed answer. One more question here: when I add the code you suggested (i.e., uint64_t start = __rdtsc();), header files is required. According to the link https://stackoverflow.com/q/13772567, the header file may be or , right? But the VS2017 cannot resolve neither of them. Does it due to the complier I choose? In other words, is it necessary to use other compilers you mentioned, i.e., gcc/clang/ICC/MSVC? — tuziYou, May 07 '20 at 12:53
IDK, works for me on Godbolt with both MSVC and Clang. https://godbolt.org/z/wK4iuh. Perhaps building an SGX application limits MSVC to a certain set of headers that doesn't include `intrin.h`? I've never done anything with SGX, I'm just here for the intel / x86 tags. I know a bit about it as a CPU mode from knowing about x86 ISA / assembly stuff, but nothing about tools to build SGX applications. Hopefully someone else will notice your question and answer it; I retagged it `[x86]` where hopefully more people will notice it. — Peter Cordes, May 07 '20 at 13:01
#include works well outside the enclave but failed inside the enclave, where the complier always throws an error that cannot resovle this header file. I guess this great tool is not allowed to be used inside the enclave, in other words, is considered to be insecure for the intel. I'm not sure whether there is any other possible solution to evaluate instuctions inside the enclave presicely. Thank you for your help, which really extended my knowledge backgroud on assembly language! Reguarding, You. — tuziYou, May 07 '20 at 13:15
*in other words, is considered to be insecure for the intel.* Uh, that's not exactly how I'd state that conclusion. More like MSVC doesn't think you should be using any intrinsics, or forgot to make it available. You might be able to find the definition of `__rdtsc` in `intrin.h` and copy it into your SGX application. Or if you were compiling for 32-bit mode, you could maybe use MSVC inline asm. (MSVC's inline asm support was so clunky that they disabled it for 64-bit mode.) — Peter Cordes, May 07 '20 at 13:24
Let me introduce SGX briefly here. A SGX application is usually consist of two parts, untrusted part (i.e., source file contains codes being developed for functional purpose rather than security) and trusted one (i.e., the enclave file contains vital codes whose security should be carefully considered and protected). Therefore, availabe header files are limited that only what is considered to be secure are allowed to be used inside enclave. That is why I guess it is because the intel does not support that is available outside the enclave but not inside. — tuziYou, May 07 '20 at 13:39
Anyway, I will try to copy the definition of rdtsc to the enclave. — tuziYou, May 07 '20 at 13:41
Right, thanks for that summary. That makes sense in general because *most* header files have prototypes for library functions you'd have to call. And obviously you can't call `printf` from inside the enclave, or any other function declared in `stdio.h`. But I think everything in `intrin.h` can just inline to a machine instruction. For example, `_popcnt_u32(x)` is something you can compute with a loop or bithacks; disallowing it is like disallowing the C++ `*` multiply operator for integers: you can easily program without it, it's just less convenient to need a shift/add loop, not more secure — Peter Cordes, May 07 '20 at 13:51
IDK, maybe there is some stuff in `intrin.h` that it makes sense to not provide, but then as a side effect you lose access to other stuff. For example rdtsc and rdrand are *not* something you could just compute a different way, so possibly there's some reason to disallow them. (Apparently SGX did disallow rdtsc entirely until SGX2, and then only with the right setting in a control register; IDK what the default for that is). In general it might make sense for *MSVC* to protect you from yourself by not allowing some intrinsics in the enclave, but that's Microsoft not Intel's choice. — Peter Cordes, May 07 '20 at 13:56
You mean, I implement rdtsc inside the enclave instead? Just as what you suggest "copy definition of rdtsc in the enclave"? — tuziYou, May 07 '20 at 13:57
Yes, I'm saying to work around this compiler/header limitation and try to get your compiler to emit an `rdtsc` instruction into your enclave machine code by copying some lines from `intrin.h` into your own header. I tried on Godbolt, using `-E` to get the compiler to output the contents of the header file. There's a `unsigned __int64 __rdtsc(void);` prototype in there, but using that manually just makes it compile to a function call to that name, not inline an `rdtsc` instruction. https://godbolt.org/z/n3BhEy. Some other line must enable it to be recognized as a compiler built-in. — Peter Cordes, May 07 '20 at 14:03
Got it. Thank you so much! I will have a try and update in time until the problem is resolved. — tuziYou, May 07 '20 at 14:11

Kassem · Answer 1 · 2020-07-10T04:51:16.420

Measuring execution time inside the enclave can be a bit tricky. There are two solutions that I can think about, and each has its own pros and cons depending on the nature of your application.

Measuring the execution time in the untrusted code using ECALLs and OCALLs.

Before running the function that you wish to evaluate its performance (by measuring its execution time), start the timer on the untrusted code.

The timer can be started either:

Before the untrusted code calls the main function in the trusted code (that you wich to measure its execution time). Start the timer on the untrusted code, call the trusted function (ECALL), when the ECALL function finishes its execution, stop the timer on the untrusted code.
While the untrusted code is running. Start the timer using an OCALL invoked by the trusted code and stop the timer by another OCALL that is invoked by the trusted code. The untrusted call should handle these OCALLs and start/stop the timer accordingly.

Depending on which part (function or multiple functions) of the trusted code you want to measure its execution time, one of the above solutions should work. Another thing to keep in mind is that you can have a vector in the untrusted code to keep track of multiple functions execution time. Once your code finishes executing, you can print the vector items or even perform some calculations on them.

If you are performing too many ECALLs and OCALLs, they can be a bit costly. You can either measure the ECALLs and OCALLs time alone and subtract them from the overall execution time, or you can check the solution below.

Measuring the execution time in the untrusted code using HotCalls.

HotCalls provide a faster interface with the enclave. This work was published by Ofir Weisse in Regaining Lost Cycles with HotCalls: A Fast Interface for SGX Secure Enclaves.

The code can be found on Ofir's repository. In his sample code, he did measure the execution time; so, his sample code can be very useful for your case.

Thanks so much for such detailed answer! Firstly, the solution I adopt currently is, first benchmarking the entire execution time, including time for switching sgx context, as the first method you mentioned above, and evaluate the execution time of codes inside the enclave purely in another project (not an ECALL). I think it is sufficient to reveal the time cost by SGX, since the two projects run on the same platform. Second, the acdemic research you mentioned is a great reference for me, which can be a baseline to judge my own evaluation results. Thanks a lot!!!! — tuziYou, Jul 09 '20 at 08:34

How to capture the time duration inside an enclave?

1 Answers1