3

i need to implement AES algorithm on a smartphone with ARM Cortex A-15 processor(Samsung Galaxy Note 3, etc) and need to observe and save cache timings for each process, round. How do I go about it? To be precise, I need to observe time taken by the processor to run each round of the AES per plaintext - key pair. I am trying to find the practicability of timing attacks in smartphones(focus on Bernstein's modified attacks but will see feasibility of both trace driven and access driven cache attacks). It is for academic purposes. I understand the architecture of the processor used. Problem lies in the assembly programming - not getting the right code- as well as how to load this program onto the smartphone.

Tashi
  • 108
  • 6
  • start with the chip documentation to find out what if any measurements are available. not likely there is, but you might get lucky. – old_timer Jun 20 '16 at 18:24
  • I second @dwelch - get the datasheet for the chip. Also, can you define what exactly do you call "cache timings"? – tum_ Jun 20 '16 at 20:55
  • Could you please clarify the question (with an [edit], not in the comments)? Reading between the lines, what I _think_ you're asking about is that you're implementing an AES algorithm and want to see whether it might be vulnerable to timing attacks via an attacker forcing cache misses. Unless it's the opposite, and you're trying to _do_ that timing attack on an existing implementation. I dunno... (side note: what modern smartphone OS doesn't already provide a robust crypto API? Unless this is a specific academic exercise, do yourself a favour and avoid reinventing wheels...) – Notlikethat Jun 20 '16 at 21:32
  • @A.Toumantsev With cache timings I mean the time taken by the aes-128 algorithm to run one round(of 10 rounds) per plaintext - key pair. Also, if it can be done, I require to observe the cache data per round. – Tashi Jun 21 '16 at 05:17
  • The phrase 'cache timing' seems bizarre. You mean the time it take the AES round to execute? Or you want to evict things from the cache in another process? The number of hits/misses is 'cache timing'? If it is a hit, the time should be constant? There are various performance measuring CP15 type registers on some ARM CPUs; don't know the Cortex-A15 specifically. – artless noise Jun 21 '16 at 05:28
  • @Prakhar Much better after your edit. The datasheet was suggested not for learning the architecture of the CPU but to find out the built-in facilities that would allow you to measure such things as cache hit/miss ratio, etc. I never tried things like these myself (yet) but from what I've read so far you will need the root priveleges in your OS and maybe the need to recompile the kernel, or at least, install your own kermel module. – tum_ Jun 21 '16 at 06:04
  • Also: "as well as how to load this program onto the smartphone." - this has nothing to do with ARM, so I suggest you add the [android] tag to your question. And obviously you need to focus on this problem first as if you can't install and run your app on a smartphone - the rest is irrelevant. – tum_ Jun 21 '16 at 06:51
  • @artlessnoise sorry for the ambiguity; you have the right idea. By cache timing I mean the time taken by one AES round to execute. I will be correlating this time data for each key - plaintext pair to execute the 'cache timing attack'. I also need to check the contents of the cache after each iteration. If I can get the _number of cache hit/miss_ (which i thought i couldn't), that will be amazing. – Tashi Jun 21 '16 at 07:05
  • @Prakhar "I also need to check the contents of the cache after each iteration" - I absolutely don't see how this can be possible on a smartphone to be honest. "If I can get the number of cache hit/miss" - no one yet confirmed you can, this depends on the chip, some allow this, some don't. You need to find out (from the datasheet). – tum_ Jun 21 '16 at 08:11
  • **If** the firmware allows the OS access to the PMUs and the kernel is compiled with perf events support - those are some potentially non-trivial ifs - then consider looking into the perf events API which should let you get at the [microarchitectural event counters](http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438g/BIIDBAFB.html) that artlessnoise alludes to. Bear in mind, though, that unless you _do_ use your own AES implementation, on any modern smartphone SoC it's quite likely to be handed off to a hardware crypto accelerator, rendering your CPU-based analysis moot. – Notlikethat Jun 21 '16 at 08:50
  • @Notlikethat Have you ever managed to "check the contents of the cache"? Would you agree this is not possible w/out special equipment? – tum_ Jun 21 '16 at 10:32
  • @A.Toumantsev Well, [it's certainly _possible_](http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438g/CIHGJJAE.html), but on a multi-core system running a full OS it's unlikely to be _useful_ without external halting debug (or potentially some very special custom hypervisor/firmware) - by the time you've got everyone into a position to read out their own cache RAMs safely in non-debug state, half the data that _was_ there is probably long gone. – Notlikethat Jun 21 '16 at 10:57
  • @Notlikethat "unlikely to be useful" or, in fact, necessary. The OP refers to Bernstein's attacks, I wonder if he read the actual Dr.Bernstein's papers describing the attack(s). As far as I can remember (from years ago), the methodology of the cache analysis was described right there, detailed enough to be used by anyone wanting to do a similar research.. [That's probably the topic for Cryptography.SE, anyway] – tum_ Jun 21 '16 at 11:25
  • @A.Toumantsev I do have read and implemented Bernstein's paper. Here, I am also talking about implementation of cache collision attacks, etc too. What I am talking about is Sprietzer 2013 paper [https://eprint.iacr.org/2013/172.pdf] -- on mobile devices. It has been done but the methodology is not clear- hence the question. – Tashi Jun 21 '16 at 21:43
  • @Notlikethat Thank you for the answer. I am looking into it. I don't clearly understand as to why I need to use my own AES implementaion? – Tashi Jun 21 '16 at 21:51
  • Well, not so much "your own" per se, but "a standalone software implementation not provided by the OS" as in that paper, because a timing analysis of a CPU copying the data into a DMA buffer, poking a crypto engine into action, then going off and doing something else until it gets a completion interrupt probably isn't very interesting. – Notlikethat Jun 21 '16 at 22:11
  • Many 'high security' implementations will add random accesses and delays to deter power analysis and timing attacks. This is especially true if the SoC has implemented security hardware that does the AES in hardware. I think the [PMU cache hit/miss ratio](http://sandsoftwaresound.net/perf/perf-tut-count-hw-events/) is more fruitful than knowing exact details of the cache. Cache alias (way allocates) will be easy to find and a simple loop with that step size will pollute to test hit/miss ratios to validate feasibility of attacks. – artless noise Jun 24 '16 at 13:39

0 Answers0