2

I was working on a program. I do not think I need to show it here, but I was wondering is it possible to create virtual file system stored on a single file. for example I have a file named my_file_system.fs, is there a way to create virtual file system into that single file only. Basically:

/home/xcodz/
    |
    +--myfilesystem.fs
       |
       +--testdir
       +--test.txt
       +--downloads
          |
          +--example1.txt

I basically want basic filesystem interface. no owners, date or other metadata. Zip is a good idea to do that but it just reads the whole file in the system all at once and does not provide file like interface. So I rquired a very basic file system in single file, in which i am able to use files like normal IO objects.

EDIT The files stored in the file system will be as big as 3 GB for a single file, and I do not have that much of a ram. TarFiles doesn't seem to make my work any better

EDIT I really mean to say some filesystem just like the one with virtual box.

AmaanK
  • 1,032
  • 5
  • 25
  • I think its possible. That how vmware disks (a file in your harddisk) act as collection of files. – PaxPrz Dec 31 '20 at 10:00
  • Can you please edit the question to add the use case? Do you really need support for all the filesystem features in python (e.g. named pipes, ownership, permissions, file handles, etc.)? – root Jan 01 '21 at 08:22
  • not all features but just the basic file system. – AmaanK Jan 02 '21 at 07:18
  • maybe just json like {“my folder ”:{“my file”:”my content”}, “another file”:”another content”} – Rocket Nikita Jan 02 '21 at 18:53
  • noop, that is not what can handle things. i need a basic file system but it must not load everything at once, file like operations (streams), single file, python implementation. – AmaanK Jan 03 '21 at 07:55
  • What are you trying to get? Why placing your files in a directory isn't good enough? – Chen A. Jan 04 '21 at 07:59
  • because i really want to limit a file system and wrap over objects of python runtime at th e moment so i can execute untrusted scripts without them modifying my system – AmaanK Jan 04 '21 at 08:06
  • also, i am really looking for file system and no workarounds – AmaanK Jan 04 '21 at 08:07
  • does this have to work on any OS / platform? – gelonida Jan 07 '21 at 00:02
  • it is supposed to be cross-platform. i can handle the problem of `\ ` and `/` but i do not want the solution to be platform specific. If i do not recv any answer for 3 days, i will simply award the bounty to the current answer. My very simply requirement: inplace io like interface, singlefile, subdirectories. nothing metadata or others are required. – AmaanK Jan 07 '21 at 07:44
  • What you're describing is basically jail mode (chroot, or in the new form of containerization). Have you considered running your app in a container, providing it access just to the resources it needs? – Chen A. Jan 08 '21 at 08:11
  • I am using this to Crete some sort of python program and I am very sure workarounds are not going to work – AmaanK Jan 09 '21 at 07:22

2 Answers2

6

Solution #1 - TAR file

TAR files are basically a unix filesystem in a single file. You can work with them in python using tarfile.

Pros:

  • Works out of the box.
  • Has all the features of a POSIX filesystem.
  • tarfile provides stream reader & writer APIs for files.

Cons:

  • Doesn't have non-POSIX features like encryption or memory mapped files.
  • Files can't be edited in-place, you'd have to extract them and then re-add them.

Solution #2 - Loopback filesystem

If you can require that mounting is done in order to run your program, you can just use a loopback filesystem:

$ truncate -s 100M /tmp/loopback.ext4
$ mkfs -t ext4 /tmp/loopback.ext4
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done                            
Creating filesystem with 25600 4k blocks and 25600 inodes

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

$ sudo mkdir /mnt/loop
$ sudo mount -o loop /tmp/loopback.ext4 /mnt/loop/
$ df -T /mnt/loop
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop11    ext4   93M   72K   86M   1% /mnt/loop
$ sudo tree /mnt/loop/
/mnt/loop/
└── lost+found

1 directory, 0 files

Pros:

  • Used like a regular filesystem.
  • Accessible from outside the python process, offline and online.
  • Very easy to debug.
  • You can add encryption, use memory mapped files, and any other feature of real filesystems.

Cons:

  • Requires root.
  • Requires mounting before running your process.
  • Requires unmounting (at the very least, in case of crashes).
  • Have to set size upfront, resizing possible but not trivial.
  • Very difficult to support cross-platform.

Solution #3 - DYI filesystem

Since you care most about file I/O, you can implement that using BytesIO. To support multiple files in a filesystem hierarchy, you can put those files in a trie. You need to serialize and deserialize all that, for which you can use pickle.

Pros:

  • Easier to customize than a TAR-based solution.
  • Can be made into a library and be nice and reusable.

Cons:

  • Requires more coding on your side.
  • Pickling the whole data structure every time is not scalable.
  • If you need crash safety, you need to pickle after every (relevant) modification to the trie or any of the file.

What to choose

Since your needs are very basic, go for #1 - TAR files.

root
  • 5,528
  • 1
  • 7
  • 15
  • Regarding loopback and root requirement: if the admin creates the mountpoint and adds an entry in `/etc/fstab` with the `user` option, then a user can mount/umount a filesystem without special privileges. – VPfB Jan 03 '21 at 08:54
3

You can use SVFS package.

SVFS allows to create virtual filesystem inside file on real filesystem. It can be used to store multiple files inside single file (with directory structure). Unlike archives, SVFS allows to modify files in-place. SVFS files use file-like interface, so they can be used (pretty much) like regular Python file objects. Finally, it’s implemented in pure python and doesn’t use any 3rd party modules, so it should be very portable. Tests show write speed to be around 10-12 MB/s and read speed to be around 26-28 MB/s.

Canopus
  • 565
  • 3
  • 17
  • extremely great for my use case – AmaanK Jan 09 '21 at 14:38
  • gonna award an extra 100 for that within 24 hours – AmaanK Jan 09 '21 at 14:42
  • Hello, I did not understand the last three numbers in initialization, are there some limitations, if yes, what?. Also please can you give me a detailed example – AmaanK Jan 10 '21 at 03:48
  • First number -> inodes(The inode count equals the total number of files and directories in a user account or on a disk. Each file or directory adds 1 to the inode count.). Second number -> [block size](https://stackoverflow.com/a/8537900/11910438). Third number -> [byte size per block](http://www.linuxintro.org/wiki/Blocks,_block_devices_and_block_sizes#:~:text=All%20linux%20blocks%20are%20currently%201024%20bytes.) – Canopus Jan 10 '21 at 07:35
  • Is their a way to make inode count unlimited? Or if I can change inode count of a already made svfs archive? – AmaanK Jan 11 '21 at 05:31
  • I think that can't change inode after created. – Canopus Jan 11 '21 at 05:40
  • so it might not help me much but that's ok. I will just award the answer written below for it is the only for me which will work but not in most effecient way – AmaanK Jan 12 '21 at 13:27
  • I just got something, i can set inode to the block count which alows it to be maximum amount for me. – AmaanK Jan 13 '21 at 04:24
  • Okay. I'm glad if it helped you a little – Canopus Jan 13 '21 at 07:44