Two-way communication to PCIe device via /dev/mem in Linux user-space?

Question

Pretty sure I already know the answer to this question since there are related questions on SO already (here, here, and here,, and this was useful),,, but I wanted to be absolutely sure before I dive into kernel-space driver land (never been there before).

I have a PCIe device that I need to communicate with (and vice versa) from an app in linux user space. By opening /dev/mem, then mmap'ing,, I have been able to write a user-space driver built on top of pciutils that has allowed me to mmap the BARs and successfully write data to the device. Now, we need comm to go the other direction, from the PCIe device to the linux user app. In order for this to work, we believe we are going to need a large chunk (~100MB) of physically contiguous memory that never gets paged/swapped. Once allocated, that address will need to be passed to the PCIe device so it knows where to write its data (thus I don't see how this could be virtual, swappable memory). Is there any way to do this without a kernel space driver? One idea here was floated,, perhaps we can open /dev/mem and then feed it an ioctl command to allocate what we need? If this is possible, I haven't been able to find any examples online yet and will need to research it more heavily.

Assuming we need a kernel space driver, it will be best to allocate our large chuck during bootup, then use ioremap to get a kernel virtual address, then mmap from there to user-space, correct? From what I've read on kmalloc, we won't get anywhere close to 100MB using that call, and vmalloc is no good since that's virtual memory. In order to allocate at bootup, the driver should be statically-linked into the kernel, correct? This is basically an embedded application, so portability is not a huge concern to me. A module rather than a statically-linked driver could probably work, but my worry there is memory fragmentation could prevent a physically contiguous region from being found, so I'd like to allocate it asap from power-on. Any feedback?

EDIT1: My CPU is an ARM7 architecture.

score 2 · Accepted Answer · answered Jan 19 '16 at 23:25

2

Hugepages-1G

Current x86_64-processors not only support 4k and 2M, but also 1G-pages (flag pdpe1gb in /proc/cpuinfo indicates support).

These 1G-pages must already be reserved at kernel boot, so the boot-parameters hugepagesz=1GB hugepages=1 must be specified.

Then, the hugetlbfs must be mounted:

mkdir /hugetlb-1G
mount -t hugetlbfs -o pagesize=1G none /hugetlb-1G

Then open some file and mmap it:

fd = open("/hugetlb-1G/page-1", O_CREAT | O_RDWR, 0755);
addr = mmap(NULL, SIZE_1G, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

You can now access 1G of physically contiguous memory at addr. To be sure it doesn't get swapped out you can use mlock (but this is probably not even necessary at all for hugepages).

Even if your process crashes, the huge page will be reserved for mapping it like above, so the pci-e device will not write rogue into system or process memory.

You can find out the physical address by reading /proc/pid/pagemap.

answered Jan 19 '16 at 23:25

Ctx

18,090
24
36
51

shoot, probably should've mentioned the CPU I'm using (at least for now) is an ARM7 architecture. Not sure if that makes your answer moot or not. Thanks a lot for the info though, I'll definitely explore this method in the coming days. – yano Jan 20 '16 at 03:03
I think, 32-bit arm does not have 1G-hugepages (maybe lesser, though). Another way to go could be to reserve part of your memory (e.g. kernel bootparameter `memmap=100M@0x10000000` reserves 100MB at 1G physical memory) and mmap it from /dev/mem into your processes memory. You have to make sure that the kernel was not compiled with CONFIG_DEVMEM_STRICT, however – Ctx Jan 20 '16 at 10:40
For a kernel-space driver, `dma_alloc_coherent` can allocate memory suitable for access by both the device and the CPU. The driver's file operation handler for `mmap` can call `dma_mmap_coherent` to map it to the user's address space. The difficulty is that the size of memory you can allocate with `dma_alloc_coherent` is normally quite limited. You can overcome that using the `cma` kernel boot parameter, e.g. `cma=100M`. In practice, you usually need to make the `cma` parameter larger than you need, as other drivers may steal some of it before your driver does. – Ian Abbott Jan 20 '16 at 18:15

score 0 · Answer 2 · answered Feb 12 '16 at 02:25

Actually Ctx's comment about memmap is what got me down the right path. To reserve memory, I gave a bootloader argument as memmap=[size]$[location] which I found here. Different symbols mean different things, and they aren't exactly intuitive. Just another slight correction, the flag is CONFIG_STRICT_DEVMEM, which my kernel was not compiled with.

There are still some mysteries. For instance, the [location] in the the memmap argument seemed to be meaningless. No matter what I set for the location, linux took all that was not reserved with [size] in one contiguous chunk, and the space that I reserved was at the end. The only indication of this was looking at /proc/iomem. The amount of space I reserved matched the gap between the end of linux memory space and the end of system memory space. I could find no indication anywhere that linux said "I see your reserved chunk and I won't touch it" other than it wasn't taken by linux in /proc/iomem. But the FPGA has been writing to this space for days now with no visible ill-effects for linux, so I guess we're all good! I can just mmap to that location and read the data (surprised this works since linux doesn't indicate this exists, but glad it does). Thanks for the help! Ian I'll come back to your comment if I go to kernel driver space.

Two-way communication to PCIe device via /dev/mem in Linux user-space?

2 Answers2