1

I have been trying to observe 4k-aliasing in an attempt to reproduce the MemJam attack on AES: https://arxiv.org/pdf/1711.08002.pdf. From what I read on the Intel documentation, 4K aliasing occurs when a store then a load occur on two addresses whose last 12 bits are identical. (For instance if a store occurs on address a then a load occurs on address b=a+0x1000).
I tried a simpler code (on a i7-6700T CPU, 2.80GHz), as in Understanding 4K aliasing on Intel CPU's , and I did manage to observe a timing difference between the aliasing and no aliasing situations. (So in that code, if b[i] *= 4.321f follows a[i] += 1.234f, I get a higher execution time than when b[i+offset] *= 4.321f follows a[i] += 1.234f).
However, when I tried targeting the first index of an SBox from a shared library containing the AES functions, I was unable to detect 4k-aliasing. What I have is a loop that first writes to an address that should incur false dependency (address of sbox[0] + 0x1000) based on Listing 3 from https://arxiv.org/pdf/1711.08002.pdf , then reads an SBox value (for all offsets in the table).
Here is the code (adapted from Listing 3 in https://arxiv.org/pdf/1711.08002.pdf) that I used to make a store in (the target address + 0x1000):

void* probe;
void* fake_probe;

void slow_down(){
__asm__ volatile(".intel_syntax noprefix\n\t"
                 "mov %%rax, fake_probe\n\t"
                 "mov %%al, byte ptr 0\n\t"
                 ".att_syntax noprefix\n\t"
                 :
                 :
                 :"rax", "al"
                 );
}

This is my main loop (note that the SBox starts at offset 0x3000 in my shared library):

int main(){

    //mapping the shared library
    int fd = open("../libaes.so", O_RDONLY);
    size_t size = lseek(fd, 0, SEEK_END);
    if (size == 0)
      exit(-1);
    size_t map_size = size;
    if (map_size & 0xFFF != 0)
    {
      map_size |= 0xFFF;
      map_size += 1;
    }
    long long b = (long long)mmap(0, map_size, PROT_READ, MAP_SHARED, fd, 0);  // getting the mapped address for the shared library

    probe = (void*)(b + 0x3000); // this is the mapped address of the sbox
    fake_probe = (void*) (b + 0x4000); // mapped address of the sbox + 0x1000, hoping to incur 4k aliasing

    const int nb_loops = 10000;

    //Actual loop
    uint8_t temp = 0;
    for(int i = 6; i < nb_loops + 6; i++){
        size_t time = rdtsc();
        for (int j = 0; j < nb_loops; j++){ // a loop for each offset, to maximize chances of detecting time difference
            slow_down(fake_probe); // writing to the overlapping address
            temp += getSBoxVal(i % 256); // reading from the SBox
          }
        size_t delta = rdtsc() - time;
        printf("time diff %zu offset %d\n", delta, i%256);
      }
    return 0;
}  

Here is an extract from the output:

    time diff 204786 offset 0
    time diff 205464 offset 1
    time diff 204314 offset 2
    time diff 204393 offset 3
    time diff 205022 offset 4
    time diff 204847 offset 5
    time diff 205602 offset 6
    time diff 205536 offset 7
    time diff 204143 offset 8
    time diff 204892 offset 9
    time diff 204734 offset 10
    time diff 204714 offset 11
    time diff 204141 offset 12
    time diff 205575 offset 13
    time diff 204468 offset 14
    time diff 205200 offset 15
    time diff 204435 offset 16
    time diff 205250 offset 17
    time diff 204639 offset 18
    time diff 205105 offset 19
    time diff 205054 offset 20
    time diff 204419 offset 21
    time diff 204905 offset 22
    time diff 204575 offset 23
    time diff 204331 offset 24
    time diff 205296 offset 25
    time diff 205287 offset 26
    time diff 204827 offset 27
    time diff 204947 offset 28
    time diff 205002 offset 29
    time diff 204908 offset 30
    time diff 204578 offset 31
    time diff 204738 offset 32
    time diff 205492 offset 33
    time diff 204708 offset 34
    time diff 205004 offset 35
    time diff 205228 offset 36
    time diff 205513 offset 37
    time diff 205026 offset 38
    time diff 204936 offset 39
    time diff 204942 offset 40
    time diff 205575 offset 41
    time diff 205014 offset 42
    time diff 205493 offset 43
    time diff 204321 offset 44
    time diff 204943 offset 45
    time diff 205065 offset 46
    time diff 203859 offset 47
    time diff 204617 offset 48
    time diff 205343 offset 49
    time diff 205191 offset 50
    time diff 204562 offset 51
    time diff 204301 offset 52
    time diff 204862 offset 53
    time diff 204808 offset 54
    time diff 205291 offset 55
    time diff 205395 offset 56
    time diff 205836 offset 57
    time diff 205113 offset 58
    time diff 205069 offset 59
    time diff 205235 offset 60
    time diff 204705 offset 61
    time diff 205303 offset 62
    time diff 204897 offset 63
    time diff 205474 offset 64
    time diff 204988 offset 65
    time diff 204772 offset 66
    time diff 205180 offset 67
    time diff 205724 offset 68
    time diff 204863 offset 69
    time diff 205075 offset 70
    time diff 205389 offset 71
    time diff 204409 offset 72
    time diff 204278 offset 73
    time diff 205162 offset 74
    time diff 204195 offset 75
    time diff 205581 offset 76
    time diff 204722 offset 77
    time diff 204732 offset 78
    time diff 204783 offset 79
    time diff 204631 offset 80
    time diff 204151 offset 81
    time diff 204605 offset 82
    time diff 204681 offset 83
    time diff 205117 offset 84
    time diff 205426 offset 85
    time diff 211020 offset 86
    time diff 204672 offset 87
    time diff 205362 offset 88
    time diff 204316 offset 89
    time diff 204591 offset 90
    time diff 204722 offset 91
    time diff 204629 offset 92
    time diff 204826 offset 93
    time diff 204881 offset 94
    time diff 204990 offset 95
    time diff 204122 offset 96
    time diff 205460 offset 97
    time diff 204467 offset 98
    time diff 204905 offset 99
    time diff 205113 offset 100
    time diff 204948 offset 101
    time diff 205373 offset 102
    time diff 205028 offset 103
    time diff 205575 offset 104
    time diff 204445 offset 105
    time diff 204828 offset 106
    time diff 205083 offset 107
    time diff 204696 offset 108
    time diff 205053 offset 109
    time diff 205232 offset 110
    time diff 204764 offset 111
    time diff 205353 offset 112
    time diff 204380 offset 113
    time diff 204921 offset 114
    time diff 205339 offset 115
    time diff 205841 offset 116
    time diff 205365 offset 117
    time diff 204585 offset 118
    time diff 205220 offset 119
    time diff 205272 offset 120
    time diff 205155 offset 121
    time diff 205222 offset 122
    time diff 204817 offset 123
    time diff 204835 offset 124
    time diff 205339 offset 125
    time diff 205094 offset 126
    time diff 205555 offset 127
    time diff 204817 offset 128
    time diff 204665 offset 129
    time diff 205561 offset 130

I wonder whether I misunderstood the way 4K aliasing works, or maybe the way Listing 3 from the article works?
I would be very grateful for any help.

grebs
  • 11
  • 1
  • Why are you using `%%rax` inside `.intel_syntax noprefix`? Not needing a `%` prefix is what `noprefix` means. Also, `mov %%al, byte ptr 0` is `mov reg, immediate`, not a memory access, despite the `byte ptr`. It doesn't actually load from absolute address `0`. Use square brackets for that (`mov al, [0]`). But IDK why you'd want to; that would just segfault. – Peter Cordes Jun 18 '20 at 23:27
  • Or if you were trying to store, then don't use `.intel_syntax`; `"mov %%rax, fake_probe\n\t"` would be correct AT&T syntax for a store to the `fake_probe` pointer variable. (Not dereferencing the pointer it used to hold, just writing that static location; x86 doesn't have memory-indirect addressing.) No reason to use absolute addressing though; use `"mov %%rax, fake_probe(%rip)\n\t"`. And if you're storing, you don't need an RAX clobber; only a `"memory"` clobber if your C ever reads or writes `fake_probe`. – Peter Cordes Jun 18 '20 at 23:30
  • 1
    TL:DR: your asm doesn't make sense; double check by disassembling and/or single-stepping with a debugger (that uses asm syntax you understand, and looking at register values) to make sure it's doing what you want. – Peter Cordes Jun 18 '20 at 23:33
  • Thank you for your reply. My goal with the `slow_down` function was to make a store in a "fake" address. Following your comments, I tried to update the assembly code (I'm sorry, I'm still kind of a newbie in assembly) to a code that gave me a difference in execution time between `slow_down(var); var2 += 1;` and `slow_down(var); var3 += 1;` where `var2` is aligned with `var` and `var3` is not. I did not understand the comment about `"mov %%rax, fake_probe(%rip)\n\t"`, though... What does `(%rip)` do in that context? – grebs Jun 19 '20 at 16:11
  • It makes it use a RIP-relative addressing mode like normal for accessing static data on x86-64. [How do RIP-relative variable references like "\[RIP + \_a\]" in x86-64 GAS Intel-syntax work?](https://stackoverflow.com/q/54745872) / [Why is the address of static variables relative to the Instruction Pointer?](https://stackoverflow.com/q/40329260) – Peter Cordes Jun 19 '20 at 16:22
  • Thank you for your help! I still need to understand better how all of this works before I edit my question, but I will do so as soon as I can. – grebs Jun 19 '20 at 17:10

0 Answers0