9

I'm trying to understand how to use the new AsyncIO functionality in Python 3.4 and I'm struggling with how to use the event_loop.add_reader(). From the limited discussions that I've found it looks like its for reading the standard out of a separate process as opposed to the contents of an open file. Is that true? If so it appears that there's no AsyncIO specific way to integrate standard file IO, is this also true?

I've been playing with the following code. The output of the following gives the exception PermissionError: [Errno 1] Operation not permitted from line 399 of /python3.4/selectors.py self._epoll.register(key.fd, epoll_events) that is triggered by the add_reader() line below

import asyncio
import urllib.parse
import sys
import pdb
import os

def fileCallback(*args):
    pdb.set_trace()

path = sys.argv[1]
loop = asyncio.get_event_loop()
#fd = os.open(path, os.O_RDONLY)
fd = open(path, 'r')
#data = fd.read()
#print(data)
#fd.close()
pdb.set_trace()
task = loop.add_reader(fd, fileCallback, fd)
loop.run_until_complete(task)
loop.close()

EDIT

For those looking for an example of how to use AsyncIO to read more than one file at a time like I was curious about, here's an example of how it can be accomplished. The secret is in the line yield from asyncio.sleep(0). This essentially pauses the current function, putting it back in the event loop queue, to be called after all other ready functions are executed. Functions are determined to be ready based on how they were scheduled.

import asyncio

@asyncio.coroutine
def read_section(file, length):
    yield from asyncio.sleep(0)
    return file.read(length)

@asyncio.coroutine
def read_file(path):
    fd = open(path, 'r')
    retVal = []
    cnt = 0
    while True:
        cnt = cnt + 1
        data = yield from read_section(fd, 102400)
        print(path + ': ' + str(cnt) + ' - ' + str(len(data)))
        if len(data) == 0:
            break;
    fd.close()

paths = ["loadme.txt", "loadme also.txt"]
loop = asyncio.get_event_loop()
tasks = []
for path in paths:
    tasks.append(asyncio.async(read_file(path)))
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
Josh Russo
  • 3,080
  • 2
  • 41
  • 62
  • 1
    See [this question](http://stackoverflow.com/questions/8645721/why-does-select-select-work-with-disk-files-but-not-epoll) for why this is failing; `epoll` doesn't support regular files. – dano Aug 17 '14 at 18:16
  • @dano: If I did this on FreeBSD, would it use kqueue and work with regular files? – Janus Troelsen May 17 '16 at 09:11
  • I'm not sure exactly, but I do know that AsyncIO's aim is to expose the file system's standard IO call backs. So if that's the standard way that FreeBSD performs IO call backs then probably – Josh Russo May 17 '16 at 10:31
  • 1
    @JanusTroelsen On FreeBSD the `SelectorEventLoop` get used, which uses the [`selectors`](https://docs.python.org/3/library/selectors.html#module-selectors) module to choose the most efficient event loop for the platform. If that's kqueue, then that should be what `selectors` chooses. I don't know if that will make `add_reader` work with regular files, though. If you give it a try, let me know how it goes! – dano May 17 '16 at 14:15

2 Answers2

10

These functions expect a file descriptor, that is, the underlying integers the operating system uses, not Python's file objects. File objects that are based on file descriptors return that descriptor on the fileno() method, so for example:

>>> sys.stderr.fileno()
2

In Unix, file descriptors can be attached to files or a lot of other things, including other processes.

Edit for the OP's edit:

As Max in the comments says, you can not use epoll on local files (and asyncio uses epoll). Yes, that's kind of weird. You can use it on pipes, though, for example:

import asyncio
import urllib.parse
import sys
import pdb
import os

def fileCallback(*args):
    print("Received: " + sys.stdin.readline())

loop = asyncio.get_event_loop()
task = loop.add_reader(sys.stdin.fileno(), fileCallback)
loop.run_forever()

This will echo stuff you write on stdin.

Jorgen Schäfer
  • 1,196
  • 6
  • 8
  • Ok, so then in my example the `os.open()` which returns a numeric file descriptor should work? Because it gives me the same result – Josh Russo Aug 17 '14 at 18:06
  • Local files cannot usually be selected/polled etc. on because they do not block. – Max Aug 17 '14 at 18:08
  • Updated my answer to reflect your updated question :-) – Jorgen Schäfer Aug 17 '14 at 18:24
  • Is that just because of the efficiency of loading files locally? If you had a large file or files that needed to be loaded would you experience blocking? – Josh Russo Aug 17 '14 at 18:25
  • Ok, I think I understand. To weave large file loads in a `yield from` statement would need to be injected into the read loop of the file load. Does that sound accurate? – Josh Russo Aug 17 '14 at 18:35
  • Asynchronous I/O does not do the actual input/output asynchronously, but rather lets you respond to input being available (or space for output being available). The read() call still blocks the process while data is transferred, but as the transfer is only a memory copy between the kernel and the user process, it's very, very fast. Local files are always available for read, so the situation where reading from them would block does not arise. Reading can block when for example network output is not available yet. – Jorgen Schäfer Aug 17 '14 at 18:59
  • 1
    And yes, if reading a large file into the process memory all at once would take too long, you can do reads in smaller chunks (`os.read` has a buffersize argument) and yield in between those chunks. – Jorgen Schäfer Aug 17 '14 at 19:01
  • On Windows, this throws an "OSError [WinError 10038] An operation was attempted on something that is not a socket" – blokeley Jul 20 '15 at 05:11
0

you cannot use add_reader on local files, because:

  • It cannot be done using select/poll/epoll
  • It depends on the operating system
  • It cannot be fully asynchronous because of os limitations (linux does not support async fs metadata read/write)

But, technically, yes you should be able to do async filesystem read/write, (almost) all systems have DMA mechanism for doing i/o "in the background". And no, local i/o is not really fast such that no one would want it, the CPU are in the order of millions times faster that disk i/o.

Look for aiofile or aiofiles if you want to try async i/o

Michel
  • 1