122

I know all the discussions about why one should not read/write files from kernel, instead how to use /proc or netlink to do that. I want to read/write anyway. I have also read Driving Me Nuts - Things You Never Should Do in the Kernel.

However, the problem is that 2.6.30 does not export sys_read(). Rather it's wrapped in SYSCALL_DEFINE3. So if I use it in my module, I get the following warnings:

WARNING: "sys_read" [xxx.ko] undefined!
WARNING: "sys_open" [xxx.ko] undefined!

Obviously insmod cannot load the module because linking does not happen correctly.

Questions:

  • How to read/write within kernel after 2.6.22 (where sys_read()/sys_open() are not exported)?
  • In general, how to use system calls wrapped in macro SYSCALL_DEFINEn() from within the kernel?
red0ct
  • 4,840
  • 3
  • 17
  • 44
Methos
  • 13,608
  • 11
  • 46
  • 49

2 Answers2

144

You should be aware that you should avoid file I/O from within Linux kernel when possible. The main idea is to go "one level deeper" and call VFS level functions instead of the syscall handler directly:

Includes:

#include <linux/fs.h>
#include <asm/segment.h>
#include <asm/uaccess.h>
#include <linux/buffer_head.h>

Opening a file (similar to open):

struct file *file_open(const char *path, int flags, int rights) 
{
    struct file *filp = NULL;
    mm_segment_t oldfs;
    int err = 0;

    oldfs = get_fs();
    set_fs(get_ds());
    filp = filp_open(path, flags, rights);
    set_fs(oldfs);
    if (IS_ERR(filp)) {
        err = PTR_ERR(filp);
        return NULL;
    }
    return filp;
}

Close a file (similar to close):

void file_close(struct file *file) 
{
    filp_close(file, NULL);
}

Reading data from a file (similar to pread):

int file_read(struct file *file, unsigned long long offset, unsigned char *data, unsigned int size) 
{
    mm_segment_t oldfs;
    int ret;

    oldfs = get_fs();
    set_fs(get_ds());

    ret = vfs_read(file, data, size, &offset);

    set_fs(oldfs);
    return ret;
}   

Writing data to a file (similar to pwrite):

int file_write(struct file *file, unsigned long long offset, unsigned char *data, unsigned int size) 
{
    mm_segment_t oldfs;
    int ret;

    oldfs = get_fs();
    set_fs(get_ds());

    ret = vfs_write(file, data, size, &offset);

    set_fs(oldfs);
    return ret;
}

Syncing changes a file (similar to fsync):

int file_sync(struct file *file) 
{
    vfs_fsync(file, 0);
    return 0;
}

[Edit] Originally, I proposed using file_fsync, which is gone in newer kernel versions. Thanks to the poor guy suggesting the change, but whose change was rejected. The edit was rejected before I could review it.

red0ct
  • 4,840
  • 3
  • 17
  • 44
dmeister
  • 34,704
  • 19
  • 73
  • 95
  • 3
    Thank you. I was thinking to do something similar by replicating sys_read/sys_open functionality. But this is great help. A curiosity, is there any way to use system calls declared using SYSCALL_DEFINE? – Methos Jul 26 '09 at 12:48
  • 6
    I tried this code in kernel 2.6.30 (Ubuntu 9.04) and reading the file crashes the system. Anyone experienced the same issue? – Enrico Detoma Oct 13 '09 at 08:35
  • @Enrico Detoma? Oh, wow. This there any way that you can give me the module you used? Never seen that before? – dmeister Oct 13 '09 at 09:36
  • 2
    That immediately raise the question of "why are you doing that FS dance, btw", which is answered quite nicely here: http://www.linuxjournal.com/node/8110/print under "Fixing the Address Space" section. – PypeBros Aug 24 '11 at 14:10
  • @dmeister, Object Not Found for ur link VFS level functions – uss Apr 01 '14 at 18:47
  • @EnricoDetoma An old comment, but perhaps somebody would run into the same problem. In my case putting file_* function definitions in the same .c file fixed the problem (no idea why). – AlexSee Jan 16 '15 at 11:35
  • For OS X users who came here, the VFS kpi is `vfs_context_create`, `vnode_open`, `vn_rdwr`, ... –  Aug 11 '16 at 10:54
  • In file_write function, how to write an array of integers, instead of array of chars? – felipeduque Sep 06 '18 at 12:57
  • I suggest you replace the write signature with this. I changed types of argument 2, 3 to prevent this error `error: pointer targets in passing argument 4 of ‘vfs_write’ differ in signedness [-Werror=pointer-sign] ret = vfs_write(file, data, size, &offset); `. I changed to size_t because it's more correct. SUGGESTION: `int file_write(struct file *file, loff_t offset, char *data, unsigned int size)` – benathon Sep 14 '21 at 08:44
47

Since version 4.14 of Linux kernel, vfs_read and vfs_write functions are no longer exported for use in modules. Instead, functions exclusively for kernel's file access are provided:

# Read the file from the kernel space.
ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos);

# Write the file from the kernel space.
ssize_t kernel_write(struct file *file, const void *buf, size_t count,
            loff_t *pos);

Also, filp_open no longer accepts user-space string, so it can be used for kernel access directly (without dance with set_fs).

Tsyvarev
  • 60,011
  • 17
  • 110
  • 153
  • What alternatives do we have other than `filp_open`? I know we shouldn't do file operations on userspace files through the kernel, but let's say we want to. – vmemmap May 28 '22 at 11:41
  • The `filp_open` function is accessible. Why do you want to use other function instead? – Tsyvarev May 28 '22 at 11:58
  • When I try to open a userspace file, it just fails claiming it was trying to derefernce a null pointer probably because it cannot access a file that is placed in userspace area. E.g `~/.text` – vmemmap May 28 '22 at 12:01
  • There is no such thing like a "file placed in userspace area". All files (including the ones under `~`) are stored in the single namespace. But the `~` is the concept of the **shell**: this character is not processed by the kernel and non-shell programs. The kernel is even not aware about a user's home directory: this concept is maintained by user space part of OS. For access a file under user's home directory from the kernel you need to specify that directory as "normal" path. E.g. `/home/tester/.text`. – Tsyvarev May 28 '22 at 12:12
  • yea I mean `~/.text` was just an example, any other abs path that I provide doesn't seem to work at all, after debugging the kernel seems to abort with `dereferencing a null pointer` and it is indeed the first argument that causes it, but if you say so then ok. – vmemmap May 28 '22 at 12:16
  • No, it is not OK for the kernel to fail with `dereferencing a null pointer` when the first parameter to `filp_open` is a valid pointer to the string in the kernel. You are better to create **separate question** about your problem: your description in the comments is very vague and unclear. – Tsyvarev May 28 '22 at 12:20
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/245125/discussion-between-roi-and-tsyvarev). – vmemmap May 28 '22 at 12:26