9

In bioinformatics, we have been working more and more with cluster-based deployments like Kubernetes, Spark, and Hadoop. The term POSIX storage keeps coming up in documentation.

What is the difference between POSIX storage and NFS block storage (EBS)? Are the terms interchangeable? Does it basically mean anything that isn't object storage (S3) or Microsoft (SMB, CIFS)?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Kermit
  • 4,922
  • 4
  • 42
  • 74
  • 1
    This should be moved to [su] – OneCricketeer Sep 13 '18 at 19:37
  • Why does it have to be moved to Super User? The difference between POSIX semantics and NFS semantics is critical when writing software that is supposed to be correct on a file system with weaker semantics (e.g. NFS) – dmeister Mar 17 '19 at 02:42

1 Answers1

23

My understanding is:

POSIX storage refers to any storage that can be accessed using POSIX filesystem functions (ie. the usual 'fopen'), and that complies with POSIX filesystem requirements: this means that it must provide several facilities like POSIX attributes, or atomic file-blocking strictly following POSIX semantics.

This is normally storage that is attached to the host (either directly or via a SAN) through a POSIX operating system. In addition, the filesystem has to be POSIX-capable.

NFS, CIFS, other NAS filesystems, as well as HDFS (Hadoop) are not POSIX compatible. These work on top of network protocols, usually backed by some other filesystem, and their access semantics don't allow for POSIX compatibility (but see @SteveLoughran note about NFS).

NTFS and FAT are filesystems, but they are not POSIX capable (they don't support locking with the same semantics). Windows doesn't provide POSIX compatible functions either, but even Linux cannot be fully POSIX-storage-compatible on these filesystems. They are not "POSIX storage".

Amazon EBS volumes are block storage (SAN), so once a volume is attached to your host, if the filesystem you use is POSIX, and you are running a POSIX operating system, you can consider it "POSIX storage".

S3 is not a filesystem, it has its own object access API, and hence it cannot support POSIX file functions.

Most typical Linux filesystems (when mounted directly by a POSIX host) are POSIX capable (ie. ext3, ext4, xfs, zfs).

jjmontes
  • 24,679
  • 4
  • 39
  • 51
  • 2
    FWIW, Hadoop isn't just a network protocol, HDFS is a filesystem. But it doesn't support seek() and write past the end of a file, or the ability to write to an existing file other than append. So it isn't Posix. NFS is mostly POSIX; [Sandberg86], "The Sun Network Filesystem: Design, Implementation and Experience" covers that. – stevel Sep 17 '18 at 12:23
  • @SteveLoughran The info about Hadoop was added by other user (cricket_007), but I have edited the answer to pinpoint that. – jjmontes Sep 17 '18 at 14:23
  • Thanks a lot for your comment, it's very simple and efficient. otherwise can you give an example why NTFS and FAT are not posix capable – Mehdi TAZI May 23 '19 at 12:22
  • I'm not an expert, but FAT doesn't support some locking semantics (like atomic renames while overwriting a file) and doesn't even have enough POSIX metadata (ie. file access time is very coarse). NTFS supports metadata that cannot be handled from a POSIX context (like reparse points), which make a filesystem look different from a POSIX client, and when mounted by a Windows host, you cannot rename open files. Personally, I've experienced severe stranded file-lock issues when serving NTFS volumes through NFS and accessing them concurrently. – jjmontes May 23 '19 at 14:31
  • Great answer, just a correction about NFS, quote from AWS documentation: "Amazon EFS provides elastic, shared file storage that is POSIX-compliant." https://docs.aws.amazon.com/efs/latest/ug/creating-using.html – Roman Sinyakov Dec 26 '21 at 17:01