46

I'm trying to access physical memory directly for an embedded Linux project, but I'm not sure how I can best designate memory for my use.

If I boot my device regularly, and access /dev/mem, I can easily read and write to just about anywhere I want. However, in this, I'm accessing memory that can easily be allocated to any process; which I don't want to do

My code for /dev/mem is (all error checking, etc. removed):

mem_fd = open("/dev/mem", O_RDWR));
mem_p = malloc(SIZE + (PAGE_SIZE - 1));
if ((unsigned long) mem_p % PAGE_SIZE) {
    mem_p += PAGE_SIZE - ((unsigned long) mem_p % PAGE_SIZE);
}
mem_p = (unsigned char *) mmap(mem_p, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, mem_fd, BASE_ADDRESS);

And this works. However, I'd like to be using memory that no one else will touch. I've tried limiting the amount of memory that the kernel sees by booting with mem=XXXm, and then setting BASE_ADDRESS to something above that (but below the physical memory), but it doesn't seem to be accessing the same memory consistently.

Based on what I've seen online, I suspect I may need a kernel module (which is OK) which uses either ioremap() or remap_pfn_range() (or both???), but I have absolutely no idea how; can anyone help?

EDIT: What I want is a way to always access the same physical memory (say, 1.5MB worth), and set that memory aside so that the kernel will not allocate it to any other process.

I'm trying to reproduce a system we had in other OSes (with no memory management) whereby I could allocate a space in memory via the linker, and access it using something like

*(unsigned char *)0x12345678

EDIT2: I guess I should provide some more detail. This memory space will be used for a RAM buffer for a high performance logging solution for an embedded application. In the systems we have, there's nothing that clears or scrambles physical memory during a soft reboot. Thus, if I write a bit to a physical address X, and reboot the system, the same bit will still be set after the reboot. This has been tested on the exact same hardware running VxWorks (this logic also works nicely in Nucleus RTOS and OS20 on different platforms, FWIW). My idea was to try the same thing in Linux by addressing physical memory directly; therefore, it's essential that I get the same addresses each boot.

I should probably clarify that this is for kernel 2.6.12 and newer.

EDIT3: Here's my code, first for the kernel module, then for the userspace application.

To use it, I boot with mem=95m, then insmod foo-module.ko, then mknod mknod /dev/foo c 32 0, then run foo-user , where it dies. Running under gdb shows that it dies at the assignment, although within gdb, I cannot dereference the address I get from mmap (although printf can)

foo-module.c

#include <linux/module.h>
#include <linux/config.h>
#include <linux/init.h>
#include <linux/fs.h>
#include <linux/mm.h>
#include <asm/io.h>

#define VERSION_STR "1.0.0"
#define FOO_BUFFER_SIZE (1u*1024u*1024u)
#define FOO_BUFFER_OFFSET (95u*1024u*1024u)
#define FOO_MAJOR 32
#define FOO_NAME "foo"

static const char *foo_version = "@(#) foo Support version " VERSION_STR " " __DATE__ " " __TIME__;

static void    *pt = NULL;

static int      foo_release(struct inode *inode, struct file *file);
static int      foo_open(struct inode *inode, struct file *file);
static int      foo_mmap(struct file *filp, struct vm_area_struct *vma);

struct file_operations foo_fops = {
    .owner = THIS_MODULE,
    .llseek = NULL,
    .read = NULL,
    .write = NULL,
    .readdir = NULL,
    .poll = NULL,
    .ioctl = NULL,
    .mmap = foo_mmap,
    .open = foo_open,
    .flush = NULL,
    .release = foo_release,
    .fsync = NULL,
    .fasync = NULL,
    .lock = NULL,
    .readv = NULL,
    .writev = NULL,
};

static int __init foo_init(void)
{
    int             i;
    printk(KERN_NOTICE "Loading foo support module\n");
    printk(KERN_INFO "Version %s\n", foo_version);
    printk(KERN_INFO "Preparing device /dev/foo\n");
    i = register_chrdev(FOO_MAJOR, FOO_NAME, &foo_fops);
    if (i != 0) {
        return -EIO;
        printk(KERN_ERR "Device couldn't be registered!");
    }
    printk(KERN_NOTICE "Device ready.\n");
    printk(KERN_NOTICE "Make sure to run mknod /dev/foo c %d 0\n", FOO_MAJOR);
    printk(KERN_INFO "Allocating memory\n");
    pt = ioremap(FOO_BUFFER_OFFSET, FOO_BUFFER_SIZE);
    if (pt == NULL) {
        printk(KERN_ERR "Unable to remap memory\n");
        return 1;
    }
    printk(KERN_INFO "ioremap returned %p\n", pt);
    return 0;
}
static void __exit foo_exit(void)
{
    printk(KERN_NOTICE "Unloading foo support module\n");
    unregister_chrdev(FOO_MAJOR, FOO_NAME);
    if (pt != NULL) {
        printk(KERN_INFO "Unmapping memory at %p\n", pt);
        iounmap(pt);
    } else {
        printk(KERN_WARNING "No memory to unmap!\n");
    }
    return;
}
static int foo_open(struct inode *inode, struct file *file)
{
    printk("foo_open\n");
    return 0;
}
static int foo_release(struct inode *inode, struct file *file)
{
    printk("foo_release\n");
    return 0;
}
static int foo_mmap(struct file *filp, struct vm_area_struct *vma)
{
    int             ret;
    if (pt == NULL) {
        printk(KERN_ERR "Memory not mapped!\n");
        return -EAGAIN;
    }
    if ((vma->vm_end - vma->vm_start) != FOO_BUFFER_SIZE) {
        printk(KERN_ERR "Error: sizes don't match (buffer size = %d, requested size = %lu)\n", FOO_BUFFER_SIZE, vma->vm_end - vma->vm_start);
        return -EAGAIN;
    }
    ret = remap_pfn_range(vma, vma->vm_start, (unsigned long) pt, vma->vm_end - vma->vm_start, PAGE_SHARED);
    if (ret != 0) {
        printk(KERN_ERR "Error in calling remap_pfn_range: returned %d\n", ret);
        return -EAGAIN;
    }
    return 0;
}
module_init(foo_init);
module_exit(foo_exit);
MODULE_AUTHOR("Mike Miller");
MODULE_LICENSE("NONE");
MODULE_VERSION(VERSION_STR);
MODULE_DESCRIPTION("Provides support for foo to access direct memory");

foo-user.c

#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>

int main(void)
{
    int             fd;
    char           *mptr;
    fd = open("/dev/foo", O_RDWR | O_SYNC);
    if (fd == -1) {
        printf("open error...\n");
        return 1;
    }
    mptr = mmap(0, 1 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED, fd, 4096);
    printf("On start, mptr points to 0x%lX.\n",(unsigned long) mptr);
    printf("mptr points to 0x%lX. *mptr = 0x%X\n", (unsigned long) mptr, *mptr);
    mptr[0] = 'a';
    mptr[1] = 'b';
    printf("mptr points to 0x%lX. *mptr = 0x%X\n", (unsigned long) mptr, *mptr);
    close(fd);
    return 0;
}
Mikeage
  • 6,424
  • 4
  • 36
  • 54
  • To clarify, you want to (in a module) return an address space to userspace acquired via vmalloc(), not kmalloc(), correct? How much memory do you actually need? – Tim Post Mar 15 '09 at 13:09
  • This is probably done easiest with kmalloc(), what you'd be doing is setting 1.5 MB of kernel space apart and presenting it to userspace. If that's what you want to do, I'll refresh myself on a few kernel innards and try to answer. – Tim Post Mar 15 '09 at 14:06
  • Note, doing this with vmalloc() can be an extremely obnoxious task. The amount you actually need to map influences the answer, so you're sure its 1.5 MB or less? – Tim Post Mar 15 '09 at 14:08
  • Yes, 1.5 MB. Maybe 2; never more than that. – Mikeage Mar 15 '09 at 19:59
  • Edited my answer regarding the remap_pfn_range function – shodanex Mar 16 '09 at 15:36
  • Why not add an ioctl() to this to make it very simple for userspace to get the correct address range? – Tim Post Jun 08 '09 at 01:54
  • +1 for an interesting question.. – Nils Pipenbrinck Oct 28 '09 at 17:42
  • I'm fascinated, but *why* is @Nils' comment flagged? – David Thomas Mar 07 '11 at 12:19
  • How do you see it's flagged? I assume because it's a relatively useless comment; "noise". Although as the OP, I like to know that I'm being upvoted :) – Mikeage Mar 07 '11 at 13:15
  • @Mikeage can you please elaborate why did you need to reserve ram area for logging and why you had to write driver for it? – Sumit Gemini Dec 31 '16 at 07:02
  • From the post: "This memory space will be used for a RAM buffer for a high performance logging solution for an embedded application. In the systems we have, there's nothing that clears or scrambles physical memory during a soft reboot. Thus, if I write a bit to a physical address X, and reboot the system, the same bit will still be set after the reboot." The idea is that if there's a problem, and the watchdog reboots, the log can be dumped on the next boot. It worked great, btw :) – Mikeage Jan 02 '17 at 02:55

5 Answers5

17

I think you can find a lot of documentation about the kmalloc + mmap part. However, I am not sure that you can kmalloc so much memory in a contiguous way, and have it always at the same place. Sure, if everything is always the same, then you might get a constant address. However, each time you change the kernel code, you will get a different address, so I would not go with the kmalloc solution.

I think you should reserve some memory at boot time, ie reserve some physical memory so that is is not touched by the kernel. Then you can ioremap this memory which will give you a kernel virtual address, and then you can mmap it and write a nice device driver.

This take us back to linux device drivers in PDF format. Have a look at chapter 15, it is describing this technique on page 443

Edit : ioremap and mmap. I think this might be easier to debug doing things in two step : first get the ioremap right, and test it using a character device operation, ie read/write. Once you know you can safely have access to the whole ioremapped memory using read / write, then you try to mmap the whole ioremapped range.

And if you get in trouble may be post another question about mmaping

Edit : remap_pfn_range ioremap returns a virtual_adress, which you must convert to a pfn for remap_pfn_ranges. Now, I don't understand exactly what a pfn (Page Frame Number) is, but I think you can get one calling

virt_to_phys(pt) >> PAGE_SHIFT

This probably is not the Right Way (tm) to do it, but you should try it

You should also check that FOO_MEM_OFFSET is the physical address of your RAM block. Ie before anything happens with the mmu, your memory is available at 0 in the memory map of your processor.

shodanex
  • 14,975
  • 11
  • 57
  • 91
  • When you say "I think you should reserve some memory at boot time, ie reserve some physical memory so that is is not touched by the kernel. " do you mean boot with mem=XXXm, where XXX is less than the actual address? That's what I was originally thinking. – Mikeage Mar 16 '09 at 10:43
  • [cont] I see some code there using ioremap(); I'll try this. So to confirm: boot with mem=XXX, ioremap(XXX+1), and then what's the best way to translate this to a userspace address?1 – Mikeage Mar 16 '09 at 10:55
  • Hrmmm. I tried this, and implemented the mmap as follows (all error checking removed): static int foo_mmap(struct file *filp, struct vm_area_struct *vma) { remap_pfn_range(vma, vma->vm_start, (unsigned long) pt, vma->vm_end - vma->vm_start, PAGE_SHARED); } – Mikeage Mar 16 '09 at 11:24
  • When I do the mmap, I get what seems to be a valid address, but any attempts to write to it result in: Kernel unaligned instruction access in arch/mips/kernel/unaligned.c::do_ade, line 544[#1]: – Mikeage Mar 16 '09 at 11:26
  • mmap is something tricky, You should eventually edit your question and post your code, or post a new question. – shodanex Mar 16 '09 at 12:33
  • I believe you are correct re shifting virt_to_phys(). The kernel sees the pages in that region as contiguous (though they aren't, just like regions obtained via vmalloc() arent), hence the PFN. I might be incorrect, I'm still looking, this is a very fun question! – Tim Post Mar 16 '09 at 17:11
  • I also checked, (virt_to_phys(x) >> PAGE_SHIFT) is the Right Way (tm). Userspace is not going to be able to deference the virt address that the kernel uses (which allows it to see those pages as contiguous). – Tim Post Mar 16 '09 at 17:23
  • Yeah, but shoul virt_to_phys be called on memory that is not mapped as RAM in the first place ? – shodanex Mar 16 '09 at 17:55
  • ok; thanks guys! I'll try this when I get back to the office tomorrow (it's an embedded system that runs at in my office; I can telnet into the device, but when it kernel panics, I can't reset it remotely ). I'll get back to you in ~14 hours... – Mikeage Mar 16 '09 at 19:32
  • wow... this seems to be working! There's a few things that I don't understand, however. ioremap(95*1024*1024,1*1024*1024) returns 0xa5f00000. virt_to_phys()>>PAGE_SHIFT is equal to 0x25F00. The call to mmap returns 0x2AAA9000. – Mikeage Mar 17 '09 at 07:30
  • [cont] Shouldn't virt_to_phys return something that's related to 95M somehow? – Mikeage Mar 17 '09 at 07:30
  • Also, is there any way I can use GDB with this address? 18 mptr = mmap(0, 1 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED, fd, 0); (gdb) p mptr $1 = 0x2aaa8000
    (gdb) x mptr 0x2aaa8000: Cannot access memory at address 0x2aaa8000
    – Mikeage Mar 17 '09 at 07:36
  • 00400000-00406000 r-xp 00000000 00:0b 64561 /foo-user 00445000-00446000 rw-p 00005000 00:0b 64561 /foo-user 00446000-00448000 rwxp 00446000 00:00 0 [heap] 2aaa8000-2aba8000 rw-s 00000000 00:01 1226 /dev/foo 7fea3000-7feb8000 rwxp 7fea3000 00:00 0 [stack] – Mikeage Mar 17 '09 at 07:39
  • Comments get the author to get a notification, but question is not updated, so it is unlikely it will get new reader by showing on the first page. For your mmap + gdb, could you post a new question, eventually giving a link to this one ? – shodanex Mar 17 '09 at 08:29
15

Sorry to answer but not quite answer, I noticed that you have already edited the question. Please note that SO does not notify us when you edit the question. I'm giving a generic answer here, when you update the question please leave a comment, then I'll edit my answer.

Yes, you're going to need to write a module. What it comes down to is the use of kmalloc() (allocating a region in kernel space) or vmalloc() (allocating a region in userspace).

Exposing the prior is easy, exposing the latter can be a pain in the rear with the kind of interface that you are describing as needed. You noted 1.5 MB is a rough estimate of how much you actually need to reserve, is that iron clad? I.e are you comfortable taking that from kernel space? Can you adequately deal with ENOMEM or EIO from userspace (or even disk sleep)? IOW, what's going into this region?

Also, is concurrency going to be an issue with this? If so, are you going to be using a futex? If the answer to either is 'yes' (especially the latter), its likely that you'll have to bite the bullet and go with vmalloc() (or risk kernel rot from within). Also, if you are even THINKING about an ioctl() interface to the char device (especially for some ad-hoc locking idea), you really want to go with vmalloc().

Also, have you read this? Plus we aren't even touching on what grsec / selinux is going to think of this (if in use).

user
  • 5,335
  • 7
  • 47
  • 63
Tim Post
  • 33,371
  • 15
  • 110
  • 174
  • this is an embedded system; I'm not worried about selinux. Regarding the rest of your Q's, I'm adding some more detail now. – Mikeage Mar 15 '09 at 20:00
5

/dev/mem is okay for simple register peeks and pokes, but once you cross into interrupts and DMA territory, you really should write a kernel-mode driver. What you did for your previous memory-management-less OSes simply doesn't graft well to an General Purpose OS like Linux.

You've already thought about the DMA buffer allocation issue. Now, think about the "DMA done" interrupt from your device. How are you going to install an Interrupt Service Routine?

Besides, /dev/mem is typically locked out for non-root users, so it's not very practical for general use. Sure, you could chmod it, but then you've opened a big security hole in the system.

If you are trying to keep the driver code base similar between the OSes, you should consider refactoring it into separate user & kernel mode layers with an IOCTL-like interface in-between. If you write the user-mode portion as a generic library of C code, it should be easy to port between Linux and other OSes. The OS-specific part is the kernel-mode code. (We use this kind of approach for our drivers.)

It seems like you have already concluded that it's time to write a kernel-driver, so you're on the right track. The only advice I can add is to read these books cover-to-cover.

Linux Device Drivers

Understanding the Linux Kernel

(Keep in mind that these books are circa-2005, so the information is a bit dated.)

myron-semack
  • 6,259
  • 1
  • 26
  • 38
2

Have you looked at the 'memmap' kernel parameter? On i386 and X64_64, you can use the memmap parameter to define how the kernel will hand very specific blocks of memory (see the Linux kernel parameter documentation). In your case, you'd want to mark memory as 'reserved' so that Linux doesn't touch it at all. Then you can write your code to use that absolute address and size (woe be unto you if you step outside that space).

Craig Trader
  • 15,507
  • 6
  • 37
  • 55
  • I actually did use memmap [or just mem=XXm@YY, which my kernel supports to reserve a block in the middle.] The open issue was how to access the memory directly. – Mikeage Oct 31 '09 at 19:16
2

I am by far no expert on these matters, so this will be a question to you rather than an answer. Is there any reason you can't just make a small ram disk partition and use it only for your application? Would that not give you guaranteed access to the same chunk of memory? I'm not sure of there would be any I/O performance issues, or additional overhead associated with doing that. This also assumes that you can tell the kernel to partition a specific address range in memory, not sure if that is possible.

I apologize for the newb question, but I found your question interesting, and am curious if ram disk could be used in such a way.

  • Concurrency might be an issue with this, especially when using DM which does not honor write barriers. The question (despite edits) is really ambiguous, how they will actually use that kind of region with that kind of interface is lacking. – Tim Post Mar 15 '09 at 17:15
  • Also, (referencing my first comment) write cache may be an issue, especially with ordering. – Tim Post Mar 15 '09 at 17:36