2

I am writing a driver as module. I have to invoke a system call sys_epoll_create1() from module. I wrote a module like this:

#include <linux/init.h> 
#include <linux/module.h> 
#include <linux/kernel.h> 
#include <linux/net.h> 
#include <linux/syscalls.h> 
#include <linux/eventpoll.h> 
#include <net/sock.h> 
MODULE_LICENSE("GPL"); 
static int hello_init(void) 
{ 
    sys_epoll_create1(1); 
    return 0; 
} 

static void hello_exit(void) 
{ 

} 

module_init(hello_init); 
module_exit(hello_exit); 

Compiling log shows like this:

~/test $ make 
make -C /lib/modules/4.2.0-16-generic/build M=/home/kyl/test modules 
make[1]: Entering directory '/usr/src/linux-headers-4.2.0-16-generic' 
CC [M]  /home/kyl/test/hello.o 
Building modules, stage 2. 
MODPOST 1 modules 
WARNING: "sys_epoll_create1" [/home/kyl/test/hello.ko] undefined! 
CC      /home/kyl/test/hello.mod.o 
LD [M]  /home/kyl/test/hello.ko 
make[1]: Leaving directory '/usr/src/linux-headers-4.2.0-16-generic' 

As I checked, there is a declaration of sys_epoll_create1() in linux/syscalls.h

asmlinkage long sys_epoll_create1(int flags);

I have included <linux/syscalls.h> as head file, why gcc still shows WARNING: "sys_epoll_create1" [/home/kyl/test/hello.ko] undefined!?

KyL
  • 987
  • 12
  • 24
  • Did you try modifying the kernel code to export such syscall to the rest of the kernel ? – Claudio Dec 17 '15 at 07:57
  • @Claudio That is my last choice. I prefer to build module without modifying kernel. – KyL Dec 17 '15 at 08:58
  • 1
    Linux kernel no longer exports(`EXPORT_SYMBOL`) syscalls implementations (`sys_*` functions). See, e.g., [this question](http://stackoverflow.com/questions/1184274/how-to-read-write-files-within-a-linux-kernel-module) about `sys_read` and `sys_open`. Unlike to reading/writeing file, which has *exported* `vfs_*` replacements, `epoll`-related functions are not exported for modules, so you cannot `epoll_create1` file descriptor and return it *to the user*. But if you want only to *poll* some set of files *inside the kernel*, there are ways for do that. – Tsyvarev Dec 17 '15 at 10:05
  • @Tsyvarev Is it possible to copy `fs/eventpoll.c` into my module source tree and build with duplicated `eventpoll.c`? – KyL Dec 17 '15 at 14:37
  • You may try, but loop-detection in eventpoll file descriptor will not work. Actually, using epoll inside the kernel looks weird. But without knowledge of your ultimate purpose, it is difficult to advice something. – Tsyvarev Dec 17 '15 at 16:46
  • @Tsyvarev I am just interested to write a HTTP web server inside kernel space, as a experiment. I know there are in-kernel server like khttpd and TUX using multiple kernel threads to handle concurrent connections. I want to try multiplexing inside kernel. It is just a personal experiment. – KyL Dec 19 '15 at 14:23
  • So you need only **poll inside the kernel** some of the files, am I right? – Tsyvarev Dec 19 '15 at 15:40
  • @Tsyvarev Yes, I need monitor some sockets inside kernel. – KyL Dec 21 '15 at 03:23

1 Answers1

1

Linux kernel no longer exports(EXPORT_SYMBOL) syscalls implementations (sys_* functions). See, e.g., this question about sys_read and sys_open. Unlike to reading/writing file, which has exported vfs_* replacements, epoll-related functions are not exported for modules, so you cannot epoll_create1 file descriptor and return it to the user.

But it is possible to implement some sort of select/poll syscall inside the kernel module. Way to store list of files descriptors for monitor can be arbitrary one.

#include <linux/poll.h>

void my_select(void)
{
    struct poll_wqueues table;

    poll_initwait(&table);
    poll_table *wait = &table.pt;

    for(;;) {
        // Call ->poll() for every file descriptor(*i*) which is monitored
        for(<i in monitored files>) {
            struct fd f = fdget(i);
            if(f.file) {
                const struct file_operations *f_op = f.file->f_op;
                int mask_output = DEFAULT_POLLMASK; // Mask of available events
                int mask_input = <mask of events which are monitored for given file>;
                if(f_op) {
                    wait->_key = mask_input;
                    mask_output = f_op->poll(f.file, wait);
                }
                if(mask_output & mask_input) {
                    // Some requested events are available. Mark that fact in some way.
                }
            }
        }
        if(<some requested events have fired>) break;
        // Important: Make futher calls to ->proc not adding wait into waitqueue.
        wait->_proc = NULL;
        // Wait events if they are not fired already.
        // For timeout waits *poll_schedule_timeout* can be used.
        poll_schedule(&table, TASK_INTERRUPTIBLE);
    }
    poll_freewait(&table);
}

This is actually a simplified implementation of kernel function do_select, so you can see it for more details.

Similar polling mechanism is used in driver serial2002. But it forgets to clear ->_proc field between wait iterations, so it is possible to get ENOMEM in case of long wait with many spurious wakeups.

Community
  • 1
  • 1
Tsyvarev
  • 60,011
  • 17
  • 110
  • 153