5

I made a tmpfs filesystem in my home directory on Ubuntu using this command:

$ mount -t tmpfs -o size=1G,nr_inodes=10k,mode=0777 tmpfs space
$ df -h space .
File system                  Size    Used Avail. Avail% Mounted at
tmpfs                        1,0G    100M  925M   10%   /home/user/space
/dev/mapper/ubuntu--vg-root  914G    373G  495G   43%   /

Then I wrote this Python program:

#!/usr/bin/env python3

import time
import pickle


def f(fn):
    start = time.time()
    with open(fn, "rb") as fh:
        data = pickle.load(fh)
    end = time.time()
    print(str(end - start) + "s")
    return data


obj = list(map(str, range(10 * 1024 * 1024)))  # approx. 100M


def l(fn):
    with open(fn, "wb") as fh:
        pickle.dump(obj, fh)


print("Dump obj.pkl")
l("obj.pkl")
print("Dump space/obj.pkl")
l("space/obj.pkl")

_ = f("obj.pkl")
_ = f("space/obj.pkl")

The result:

Dump obj.pkl
Dump space/obj.pkl
0.6715312004089355s
0.6940639019012451s

I am confused about this result. Isn't the tmpfs a file system based on RAM and isn't RAM supposed to be notably faster than any hard disk, including SSDs?

Furthermore, I noticed that this program is using over 15GB of RAM when I increase the target file size to approx. 1 GB.

How can this be explained?

The background of this experiment is that I am trying to find alternative caching locations to the hard disk and Redis that are faster and available to multiple worker processes.

Green绿色
  • 1,620
  • 1
  • 16
  • 43
  • Wouldn't you use `cpickle` if in a hurry? – Mark Setchell Sep 25 '20 at 16:42
  • More of a discussion point than an answer; sorry about the formatting this inflicts. I created a tmpsfs using the same means as you (with the same name under my home, space). `$ time dd if=/dev/zero of=space/test.img bs=1048576 count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0231555 s, 4.5 GB/s real 0m0.030s user 0m0.000s sys 0m0.030s` – tink Sep 25 '20 at 22:19
  • And to SSD: `$ time dd if=/dev/zero of=test.img bs=1048576 count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.165582 s, 633 MB/s real 0m0.178s user 0m0.000s sys 0m0.060s` – tink Sep 25 '20 at 22:19
  • Could be python responsible for the time, not the FS/medium of choice. `0m0.030s` vs `0m0.178s` ... seems like a clear winner for tmpfs ... – tink Sep 25 '20 at 22:20
  • 1
    @tink Yes, I can replicate your observations, so probably a Python issue. I would speculate, that maybe it's the reconstruction of the Python data structure that takes most of the time into account, so that the short read times do not alter the total time notably. – Green绿色 Sep 26 '20 at 03:50
  • @MarkSetchell Using `_pickle` instead of `pickle` does not make any difference to the final time measurements. A library called `cpickle` apparently does not exist in Python3. – Green绿色 Sep 26 '20 at 03:56
  • Glad that's settled, then ;) – tink Sep 26 '20 at 04:09
  • Would still be curious of the actual reasons though. – Green绿色 Sep 26 '20 at 04:17
  • @Green - new question? Timing of pickling in Python? – tink Sep 26 '20 at 18:14
  • ok, thanks for your contribution! – Green绿色 Sep 27 '20 at 01:03

1 Answers1

2

Answer flowing on from comments:

The time elapsed seems to be a python thing, rather than the media of choice.

In a similar set-up (SSD vs tmpfs) using OS commands on Linux the speed difference in writing a 100MB file is notable:

To tmpfs:

$ time dd if=/dev/zero of=space/test.img bs=1048576 count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0231555 s, 4.5 GB/s

real    0m0.030s
user    0m0.000s
sys 0m0.030s

To SSD:

$ time dd if=/dev/zero of=test.img bs=1048576 count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.165582 s, 633 MB/s

real    0m0.178s
user    0m0.000s
sys 0m0.060s
tink
  • 14,342
  • 4
  • 46
  • 50