2

I have just finished reading the Google File System (GFS) paper. The paper says that GFS is

optimized for appending operations rather than random writes. 

Seeing that this characteristic is emphasized throughout the paper, I take it that it must be very important.

As a student who has had no working experience at all, what are some real-life examples of such Appending Operations that Google speaks of? It sounds pretty intense.

Some Noob Student
  • 14,186
  • 13
  • 65
  • 103

1 Answers1

1

It is a central limitation of the Google File System. It contrasts it from general purpose parallel file systems like GPFS. However, it makes to design a lot easier when it comes e.g. to replication. As Google is able to design its application around its file system and because random operations are inherently slow (on rotating media), this is fine for them.

Tons of things are "append" operations:

  • New log entries are appended to a log file. (GoogleFS can also append to an already closed file (with certain limitations, the very similar http://hadoop.apache.org/hdfs/ is not able to do that).
  • New web crawl data is appended to a crawl file instead of overwriting existing versions of the crawl within a file.
  • All MapReduce (you should also read that paper) outputs are writing a file from the beginning to the end, appending key/value pairs to the file(s).
  • ...

All writes to a file not updating data in the middle of the file using a seek or a pwrite operations are appends. The most important usage of random writes are (classical) database backends.

dmeister
  • 34,704
  • 19
  • 73
  • 95