How does memory mapped I/O (MMIO) work on ARM architectures?

Question

I would like to understand how the MMIO works on ARM architecture.

I realized that ARM provides 1:1 mapping from physical address to specific peripheral. For example, to manage the GPIOX on arm, for example in Raspberry Pi, the processor accesses the specific physical addresses (seems that preconfigured by the manufacturer?) without configuring some registers beforehand.

I thought that there are some specific BAR register that arbitrates the read/write request to specific physical addresses to peripherals. However, when I check the spec for BCM2835 (Raspberry pi 3), the physical addresses are translated by another MMU called VC/ARM MMU. Is it a common design to have another MMU that translates the physical addresses to bus addresses in ARM architecture?

Also, I was wondering how the SMMU (IOMMU in x86) is utilized in this concept. I found one article mentioning that the VC/ARM MMU is an example of SMMU but I think that is not true? When I check some monitor code & kernel driver code implementation (not raspberry pi), it seems that the SMMU is also mapped to specific physical addresses and the monitor/kernel uses those addresses to initialize and communicate with SMMU. If the arm architecture utilize VC/ARM MMU as an SMMU, how that physical address mapping for SMMU itself can be accessed to initialize the SMMU..?

Lastly, I thought that all peripherals are managed by the SMMU. If some peripherals are always mapped to fixed physical addresses, what is the role of SMMU? Why some peripherals are communicated with PE (CPU) through fixed physical addresses, and some are communicated with SMMU..? How exactly the peripherals and PE (Processors) can communicate in ARM architecture..?

I think your fundamental confusion is the incorrect notion that addresses are "memory" and I/O is a hole you have to punch through or some special thing that goes around the side, the 8088/86 became obsolete a long time ago. The address space is just addresses with fetch, read and write transaction types. The chip folks (specific to each manufacture and possibly individual chip design). they are not "memory space" but "address space" the list of things that you decode the ADDRESS into is sram, dram, flash, uart, gpio... — old_timer, Oct 28 '22 at 16:29
dram and gpio are no more different or special in this case, one has one base address in the address space for that processor and another specific base address. The big problem comes from the documents still use the term "memory map" or something with the word "memory" to define the address space which is very much not all memory and from the processor core perspective does not usually care what is where, it is just address bits. — old_timer, Oct 28 '22 at 16:31
also I assume textbooks and professors whom many may have been around during those x86 vs motorola days, still think in terms of memory mapped I/O. We have to stop doing that. — old_timer, Oct 28 '22 at 16:32
There is no typical solution for chip design, certainly not for chips with multiple processor cores. It is very much the chip team, not the processor core people that determine that system design, in some cases it bleeds over into the hardware (the boards the chips are soldered to) as to where things are and how the software gets to it. There are probably more solutions than there are chip companies as you will find that one chip company may do it more than one way across their product line. This broadcom one is interesting at best. arm does not have a one to one as shown btw. — old_timer, Oct 28 '22 at 16:35
looks like arms smmu is so you can get at things you cant get at normally ( PCIe ATS and PRI features.) is my guess. — old_timer, Oct 28 '22 at 16:35
the above addressing in the broadcom products used on the lets say full sized raspberry pi products (not the mcu one) has used the same scheme from the first pi with an arm11 (pi zero uses/used that chip). they moved the arm address space around when they realized we might want more than 512Mbytes of dram, oops. that lesson was learned for life I assume. its just an addressing scheme and sharing of dram/peripheral solution like any other similar chip would have. — old_timer, Oct 28 '22 at 16:39
@old_timer I appreciate the kind explanation and totally agree with that the confusion is coming from the name "MEMORY MAPPED IO". After reading some articles and textbooks and based on your answer I think I grab the idea, but still, one thing I cannot get is SMMU. So I assume that the basic peripherals are usually mapped to fixed address space (determined by the vendor), and the others such as PCIe devices that cannot be fixed mapped to one specific address are later configured by the SMMU and PCIe bridges (which are configured by the initial fixed mapping)? — ruach, Oct 29 '22 at 02:42
@old_timer Also, one thing very confusing to me is how and when the SMMU is used..? I thought that all peripherals can access memory space (DRAM) through the SMMU because it provides some translation capabilities for IO devices..? If most of the peripherals are just fixed mapped to physical address, then why we need SMMU..? — ruach, Oct 29 '22 at 02:48
I think that is too broad for stackoverflow, way too broad. you can certainly just contact arm for guidance if and when you would purchase an smmu and then how to use it. — old_timer, Oct 30 '22 at 15:12

artless noise · Answer 1 · 2022-10-29T13:00:00.883

1

I am just answering your questions. There are many things wrong with some assumptions you have, which Old Timer tries to clarify.

Is it a common design to have another MMU that translates the physical addresses to bus addresses in ARM architecture?

No. Broadcom has an architecture license. They designed there own HDL code for the ARM CPU and system details can be different. In fact, it is quite common for even direct license (using ARM HDL) that the systems differ from vendor to vendor.

If the arm architecture utilize VC/ARM MMU as an SMMU, how that physical address mapping for SMMU itself can be accessed to initialize the SMMU?

If some peripherals are always mapped to fixed physical addresses, what is the role of SMMU? Why some peripherals are communicated with PE (CPU) through fixed physical addresses, and some are communicated with SMMU..? How exactly the peripherals and PE (Processors) can communicate in ARM architecture..?

The typical solution to this is 'TrustZone'. Here, the access is defined as having a 'secure' or 'normal' access. The master (CPU) tags the access and either the peripheral or a bus access controller permits or prohibits access to the peripheral. These are best statically mapped at boot time and locked.

By defining permitted use cases, the peripheral/master access patterns can be defined for the system. The complication is 'dynamic' peripherals and masters. The ARM TrustZone CPU is dynamic. A 'world switch' changes the CPU access and there are attacks on the communication interface between 'normal' and 'secure' worlds.

The ARM AXI/AHB buses were originally designed for embedded devices. PCs in contrast have dynamic buses ISA->EISA->PCMIA->PCI (etc.) These addresses are typically dynamic. ^Note However, this bus structure has an expense. So some of your question is like asking why isn't ARM just like an x86 PC. They had different goals and they are different. You can't put a new grahpics card into your cell phone.

Reference:

Note: It is from the modular nature of the PC system which was part of why the PC dominated Apple, Commodore, Atari, etc. in my opinion (the other aspect was piracy), contrary to others who think the answer is Microsoft. The hardware matters.

edited Oct 29 '22 at 13:00

answered Oct 29 '22 at 12:33

artless noise

21,212
6
68
105

Thanks a lot. I have a few remaining questions. I understand that normally ARM does not need to consider the cases where extra peripherals are attached to the platform after manufacture. It makes sense to have fixed mapping for those peripherals installed by the vendor. Having said that, then why the SMMU is required on ARM platform? Is it only required for the more general arm SoCs such as Mac M1 and M2 cases designed for PC like usages (although it is not designed for attaching new PCIe or peripherals..)? – ruach Oct 29 '22 at 20:24
What I really want to understand is how the read/write operations of the CPU on the fixed addresses assigned for peripherals are translated and delivered to the proper end device. Are all peripherals and DRAM connected to the same AXI bus? so that each peripheral/DRAM can process only the transactions heading to that peripheral/DRAM?? or based on the addresses, the read/write operations are processed by different units? such as MMU vs SMMU? and sent to different buses..? I really really appreciate your answer once again – ruach Oct 29 '22 at 20:25
Regarding Trustzone, I cannot understand how does it works including this comment from one of the reference you provided '''Mainly it is related to memory regions. All TrustZone compatible devices will tag AXI Bus access with an NS bit. This bit specifies whether the access is from a secure or normal world. In this way, even DMA peripherals under the control of the normal world can be isolated.''' – ruach Oct 29 '22 at 20:27
1

neither smmu nor trustzone are required, arm's roots predate that. – old_timer Oct 30 '22 at 15:16
1

maybe not in comments but the answer I deleted. There is no reason to assume that axi/etc goes all the way to each target. depending on the company/individual/etc they may already have an array of peripherals and an internal bus they have used for many years and simply apply that to a new processor core. does every post office have to use the exact same chairs and tables and boxes and machines to sort mail into the trucks? is it impossible to do that without the same exact uniform on? of course not – old_timer Oct 30 '22 at 15:18
axi is big and bulky and no reason to carry that to the target, most likely in your best interest not to. another question and answer is do they use a shared bus that touches all the targets or do the buses divide up into individual ones? you would assume that for peripherals on a different clock domain those at least have a separate bus/infrastructure from say the dram target – old_timer Oct 30 '22 at 15:19
also performance, you may wish the dram target to be a wider bus than say gpio. – old_timer Oct 30 '22 at 15:20
In fact many vendors are using IP that works with PowerPC, ColdFire, RiscV, x86, etc. They may have different bus structures or even their own proprietary bus. STM, BroadCom, NxP, Apple, Nordic, etc. will all have different structures. Usually only the 'end user' needs to think they are the same. – artless noise Nov 17 '22 at 19:02

How does memory mapped I/O (MMIO) work on ARM architectures?

1 Answers1