Named memory-mapped files in Python?

Question

I'm using OpenCV to process some video data in a web service. Before calling OpenCV, the video is already loaded to a bytearray buffer, which I would like to pass to VideoCapture object:

# The following raises cv2.error because it can't convert '_io.BytesIO' to 'str' for 'filename'
cap = cv2.VideoCapture(buffer)

Unfortunately, VideoCapture() expects a string filename, not a buffer. For now, I'm saving the bytearray to a temporary file, and pass its name to VideoCapture().

Questions:

Is there a way to create named in-memory files in Python, so I can pacify OpenCV?
Alternatively, is there another OpenCV API which does support buffers?

Peter Badida · Accepted Answer · 2021-09-26T09:35:32.150

4

Note: POSIX-specific! As you haven't provided OS tag, I assume it's okay.

According to this answer (and this shm_overview manpage) there is /dev/shm always present on the system. That's a tmpfs mapped in a shared (not Python process memory) memory pool, as suggested here, but the plus is that you don't need to create it, so no funny inventing of:

os.system("mount ...") or
Popen(["mount", ...]) wrappers.

Simply use tempfile.NamedTemporaryFile() like this:

from tempfile import NamedTemporaryFile
with NamedTemporaryFile(dir="/dev/shm") as file:
    print(file.name)
    # /dev/shm/tmp2m86e0e0

which you could then feed into OpenCV's API wrapper. Alternatively, utilize pyfilesystem as a more extensive wrapper around that device/FS.

Also, multiprocessing.heap.Arena uses it too, so if it didn't work, there'd be much more trouble present. For Windows check this implementation which uses winapi.

For the size of /dev/shm:

this is one of the size "specifications" I found,
shm.h, shm_add_rss_swap(), newseg() from Linux source code may hold more details

Judging by sudo ipcs it's most likely the way you want to utilize when sharing stuff between processes if you don't use sockets, pipes or disk.

As it's POSIX, it should work on POSIX-compliant systems, thus ~~also on MacOS~~(no) or Solaris, but I have no means to try it.

edited Sep 26 '21 at 09:35

answered Sep 25 '21 at 18:29

Peter Badida

11,310
10
44
90

1

Using `/dev/shm` is a good idea. I didn't propose it only because you don't (afaik) get to say how big it is, which *might* be important. I'd rather make a tmpfs myself and *know* I won't kill the system. but YMMV – 2e0byo Sep 25 '21 at 19:02
@2e0byo fair point. I'll try to dig out some constant or setup from the kernel source code and edit the answer, if I find it. For now I've added only linuxquestions.org reference. – Peter Badida Sep 25 '21 at 19:09
interestingly the output of `cat /proc/mounts | grep tmpfs` on my system suggests `/dev/shm` is not limited by a mount option (but might easily be limited by something else) whilst /tmp is, and /run/user/1000 is even *more* limited: `tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0 tmpfs /tmp tmpfs rw,nosuid,nodev,nr_inodes=409600,inode64 0 0 tmpfs /run/user/1000 tmpfs rw,nosuid,nodev,relatime,size=376112k,nr_inodes=94028,mode=700,uid=1000,gid=1000,inode64 0 0` – 2e0byo Sep 25 '21 at 19:29
1

shm.h seems to contain max size constants, so I think it *is* limited – 2e0byo Sep 25 '21 at 19:31
@2e0byo I'd say it has to be. There isn't anything such as unlimited ram/storage anywhere, otherwise the kernel itself would just hang due to no memory if I just `cat /dev/urandom > /dev/shm/stuff` or similar shooting oneself to the foot. Though most likely it's either distro/family-specific or just limited from the code itself while having left "unlimited" area from the tmpfs mounting. Or maybe it starts with one big memory pool somewhere in the kernel for all tmpfs instances and then just allocated a part of that for each instance, such as `/dev/shm`, `/tmp` etc. – Peter Badida Sep 25 '21 at 19:39

2e0byo · Answer 2 · 2021-09-25T17:15:29.017

Partially to answer the question: there is no way I know of in python to create named file-like objects which point to memory: that's something for an operating system to do. There is a very easy way to do something very like creating named memory mapped files in most modern *nixs: save the file to /tmp. These days /tmp is almost always a ramdisk. But of course it might be zram (basically a compressed ramdisk) and you likely want to check that first. At any rate it's better than thrashing your disk or depending on os caching.

Incidentally making a dedicated ramdisk is as easy as mount -t tmpfs -o size=1G tmpfs /path/to/tmpfs or similarly with ramfs.

Looking into it I don't think you're going to have much luck with alternative apis either: the use of filenames goes right down to cap.cpp, where we have things like:

VideoCapture::VideoCapture(const String& filename, int apiPreference) : throwOnFail(false)
{
    CV_TRACE_FUNCTION();
    open(filename, apiPreference);
}

It seems the python bindings are just a thin layer on top of this. But I'm willing to be proven wrong!

References

https://github.com/opencv/opencv/blob/master/modules/videoio/src/cap.cpp#L72

Christoph Rackwitz · Answer 3 · 2021-09-25T17:44:20.400

If VideoCapture was a regular Python object, and it accepted "file-like objects" in addition to paths, you could feed it a "file-like object", and it could read from that.

Python's StringIO and BytesIO are file-like objects in memory. Something useful to remember ;)

OpenCV specifically expects a file system path there, so that's out of the question.

OpenCV is a library for computer vision. It's not a library for handling video files.

You should look into PyAV. It's a (proper!) wrapper for ffmpeg's libraries. You can feed data directly in there and it will decode. Here are some examples and here are its tests that demonstrate further functionality. Its documentation is thin because most usage is (or should have been...) documented by ffmpeg itself.

Thanks for suggesting PyAV. I'm using OpenCV, as we'll probably need vision down the road. — bavaza, Sep 26 '21 at 05:00

score 1 · Answer 4 · answered Sep 25 '21 at 17:49

1

You might be able to get away with a named pipe. You can use os.mkfifo to create one, then use the multiprocess module to spawn a background process that feeds the video file into it. Note that mkfifo is not supported on Windows.

The most important limitation is that a pipe does not support seeking, so your video won't be seekable or rewindable either. And whether it actually works might depend on the video format and on the backend (gstreamer, v4l2, ...) that OpenCV is using.

answered Sep 25 '21 at 17:49

Thomas

174,939
50
355
478

From the file object/filesystem point of view it seems like a broken bike. Then again, if only a simple read-once buffer is necessary, that's pretty much a purpose (one of, at least) of a pipe. +1, I'll for sure use it in the future even this way. Socket-like behavior, kind of. – Peter Badida Sep 25 '21 at 18:44
1

Indeed. As for the "broken bike" metaphor: in Unix, not everything on the filesystem has to be a "regular file". For example, `/dev/stdin` and `/dev/stdout` are also streams, and so are most device nodes that communicate with hardware. But sure enough, some libraries _will_ assume that the file you pass to it is seekable. – Thomas Sep 26 '21 at 09:29
Yes, that's why the metaphor. It's a really good way for a quick buffer between any two ends of software, but other than that it's not really useful (in comparison with in-memory file or shared memory). It's probably the cheapest OS-specific form though thus it might be even more preferred than `/dev/shm` in some cases. So like a bike without brakes or without a chain, you *still* can slide on it down the hill and it'll work just fine. Just not the rest of the stuff. :D – Peter Badida Sep 26 '21 at 09:33

Named memory-mapped files in Python?

4 Answers4

References