I am working with a proprietary OS running on a Cortex-A5 core with MMU, caches (I&D) and branch predictor enabled. The mapping of the IO register space in the level 2 translation table is taking place after the MMU is set up and active and the caches are enabled.
The current sequence is:
- Calculate the page table entry (PTE) value
- Calculate the offset for the PTE in the level 2 table
- Write the PTE into the level 2 table
- -- Data sync barrier
- Invalidate the data cache for the mapped page
- Flush the branch predictor cache (which is supposedly unnecessary)
-- Instruction sync barrier
This sequence doesn't seem to work with I-cache enabled. Unfortunately the debugging capabilities of the hardware are somewhat limited, but it seems that instructions performed are not the instructions that are supposed to perform for the given program counter.. Disabling the I-cache or adding the I-cache invalidation instruction after the sequence seem to work around the problem. But the reason for this is unclear to me. Why would the I-cache be affected when mapping data area? Is the sequence above correct? What would be the correct one? (And there is still a small non-negligible probability of a hardware bug).