Can coherency issue happen between secure DMA and non-secure CPU on TrustZone system

Question

I encounter some problem which I think is about coherency between DMA and CPU. Here is the simplified use case.

Cortex A5 CPU writes to the non-secure memory under non-secure state. MMU is enabled and the memory attribute is normal, shareable and cacheable. So the data may be in so-called non-secure cache according to the following references:

[1] "Data Cache Tag data format" in DDI0434B_cortex_a5_mpcore_r0p1_trm

[2] Answer from artless noise in Secure mode can access secure / non secure memory how?

[3] 6th page in http://atmel.force.com/support/servlet/fileField?id=0BEG000000002Ur
DMA issues a secure coherent read to that memory area. DMA is connected to ACP.
SCU will only try to check cache tagged as secure. So the data is returned from main memory according to "ACP requests" in DDI0434B_cortex_a5_mpcore_r0p1_trm.
DMA may get stale data.

If my description is correct, here is my second question. Which action can solve this issue?

1) Clean the cache explicitly before DMA reads

2) Change the attribute of the memory to normal, shareable and non-cacheable.

Also the TRM says there is a write buffer that is used to hold data from cache evictions or non-cacheable write bursts before they are written out on the SCU interface in "Bus Interface Unit and SCU interface". However no more information about the write buffer can be found. May I assume the write buffer has some magic to make sure coherency? I would appreciate a lot if someone can explain the magic.

Considering more combinations, I get this table myself.

 CPU state |   Memory   | DMA access | Result    
-----------+------------+------------+-----------+    
non-secure | non-secure | non-secure | OK            
non-secure | non-secure |   secure   | NG    
non-secure |   secure   |     -      | NA    
secure     | non-secure | non-secure | OK    
secure     | non-secure |   secure   | NG (Same reason as the second case)
secure     |   secure   | non-secure | NA    
secure     |   secure   |   secure   | OK

The first column shows under which state CPU writes to the memory.

The second column shows the memory is secure or not. Here "secure" means access will be denied by AXI if the AxPROT[1] is high. Memory attribute is normal, shareable and cacheable.

The third column shows what kinds of access DMA issues. The access is coherent.

The fourth column tells what will happen.

NG means DMA may get stale data.

NA means the access is impossible.

OK means DMA is coherent with CPU.

Is is right?

I add this part according to Notlikethat's answer. It may be a nightmare in practice. However I want to know what happens theoretically.

Suppose that the physical address 0x10 is mapped to two virtual addresses, NS:0x110 and S:0x210.
CPU under secure state writes value 0x110 to NS:0x110, and then writes value 0x210 to S:0x210.
So both may be in L1 cache at the same time, right?
Then DMA1 makes a secure read to PA 0x10 and may get 0x210. DMA2 makes a non-secure read to PA 0x10 and may get 0x110.
After that it is unpredictable which one will be in the main memory eventually.

Please confirm or correct my understandings. Thank a lot.

I don't understand how your question differs from something without TrustZone? You have issues with a cache and DMA not matter what. The SCU is typically only for multi-CPU interfaces (with seperate L1 and shared L2) and not DMA devices. Are you certain that the SCU would solve for the ones you have as **OK**? I think you still have issues. For something like video you can use HSYNC/VSYNC to flush video memory. For others, you need to make the memory non-cacheable. This is standard for many, many drivers/devices in the world. — artless noise, Dec 02 '16 at 14:11
@artlessnoise Note _"DMA is connected to ACP"_ - assuming the peripheral is making accesses with the appropriate cacheability and shareability attributes then it certainly should be coherent with the CPU, because that's what the ACP is _for_. — Notlikethat, Dec 02 '16 at 22:01
_"Suppose that the physical address 0x10 is mapped to two virtual addresses, NS:0x110 and S:0x210"_ - it doesn't work like that. There are still _two_ physical address spaces - secure VAs can map to secure or non-secure PAs, non-secure VAs can only map to non-secure PAs. The cache tags the security state as part of the PA, thus secure 0x10 and non-secure 0x10 are not the same thing. While the dirty data is in the cache the two are still entirely distinct; it's only unpredictable which line gets written back first, thus what ultimately ends up in the DRAM backing _both_ PAs. — Notlikethat, Dec 04 '16 at 12:26
@Notlikethat That sounds different from the answers in the question I refer to. I am also so confused about such non-secure and secure physical address space. However I think the following can partly answer my original question. Section B3.4 in ARM v7 ARM, DDI 0406C.c, says "A system implementation can alias parts of the Secure physical address space to the Non-secure physical address space ... the use of aliases in this way can require the use of cache maintenance operations ...". There may be coherency issue between secure DMA and non-secure CPU — Hs Zhang, Dec 04 '16 at 14:05
@Notlikethat Sorry, I missed the OPs comment that DMA is connected to ACP. I wouldn't want to mix security with the ACP support. One hw glitch and your whole system maybe compromised. I already have issues with checking TZASC validity; however, the TZASC is mainly static state. The ACP is much more dynamic and more likely to have some latent bug. — artless noise, Dec 05 '16 at 14:05
@artlessnoise Your comment seems to say ACP is not reliable? — Hs Zhang, Dec 05 '16 at 14:52
@HsZhang the linked answer is from the whole-system view, i.e. for all the memory/peripheral registers/etc. behind the TZASC ("partition checker") there _is_ only a single address map - the TZASC just controls which _parts_ of it represent secure vs. non-secure PAs. However, the CPUs (and by extension the ACP) are on the other side, and from their point of view secure/non/secure PAs are entirely separate. The coherency issue the ARM ARM mentions is the same one I've described below, and your conclusion is indeed right - it's only your reasoning to get there that's not quite correct. — Notlikethat, Dec 05 '16 at 22:40
@Notlikethat Got that. I think what you want to emphasize is that PA NS:0x10 and PA S:0x10 may not be the same memory location. However it is same in my system. I find [Section 3.2.3 in Trustzone Security Whitepaper](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.prd29-genc-009492c/ch03s02s03.html) describes what I am doing. — Hs Zhang, Dec 06 '16 at 03:40
@HsZhang The issue with using the ACP in a security setting is that it is another point of attack. The TZASC might be buggy as well (or more so custom AHB partition logic by the SOC vendor). You rely on hardware to be correct for many TZ security solutions. If the ACP is for video or other high performance HW, the maybe it is worth the price. However, I would just use traditional mechanism and avoid the ACP in many cases. Ie, assume the DMA memory should not be cached (also saves the cache for other things). — artless noise, Dec 06 '16 at 13:54

Notlikethat · Accepted Answer · 2016-12-05T23:18:05.957

Architecturally, you can't make "secure accesses to non-secure memory". The secure and non-secure physical address spaces are independent (many TrustZone-capable systems have things which appear at different addresses to secure vs. non-secure accesses). In practice though, it is the case that a lot of peripherals (including memory controllers) aren't TrustZone-aware, thus on most systems are behind a TZASC at which point the secure and non-secure address spaces do largely overlap.

On the CPU, secure software accesses non-secure memory either via a page table entry with the NS bit set, or from monitor mode with SCR.NS set; in both cases, the resulting bus access has AxPROT[1] set, i.e. it is a non-secure access. If you want a peripheral to correctly access non-secure memory, you need to get it to issue non-secure accesses in a similar fashion.

Otherwise, what you have is a case of the more general problem of physical aliases, which are usually not recommended unless absolutely necessary - the more traditional example is systems with DRAM at fairly high physical addresses (e.g. 0x80000000, or even above 32 bits on LPAE-capable systems), part of which is aliased lower down (e.g. at 0x0 for boot vectors) for use until the MMU is set up. Even a physically-tagged cache has no idea that 0x0 and 0x80000000 actually end up at the same place on the other end of the interconnect, so if you were to use both at once you would indeed have a coherency nightmare which is outside the scope of the architecture, and can only be managed with explicit cache maintenance.

In the same manner, a TrustZone-aware cache, which includes the security state as part of the physical address tag, has no idea that S:0x80000000 and NS:0x80000000 might actually end up in the same place on your particular system (in general, they might well not), so again, the two addresses are not coherent unless managed manually - data written to one alias must be cleaned from the cache to a point beyond the TZASC (i.e. usually all the way to DRAM) before it is visible from the other. Note that if your Cortex-A5 system has an outer L2 cache like a PL310, that means cleaning the CPU caches by VA to PoC, followed by cleaning L2 by PA via secure/non-secure accesses as appropriate, which probably all has to be done by the secure world alone to avoid synchronisation problems. In theory having everything make non-cacheable accesses would work around the coherency issue by forcing all data to take the round trip through DRAM, although it's not impossible that certain outer cache configurations could still get in the way of that. Far better to make the DMA controller issue non-secure accesses directly where appropriate, so you can actually benefit from the caches rather than fight against them.

I was to mean CPU under secure state makes non-secure access to memory. What I concern about is whether the cache is virtually separated to secure and non-secure part. And how ACP manages coherency. So I add a more specific case in my question. — Hs Zhang, Dec 04 '16 at 05:46

Can coherency issue happen between secure DMA and non-secure CPU on TrustZone system

1 Answers1