3

Do cache (oblivious|optimal|aware) algorithms typically consider seek time in their model. If not are there examples of models that do consider seek time, and are there analysis of algorithms in this model.

san
  • 4,144
  • 6
  • 32
  • 50
  • I don't understand how you can have a cache-optimal algorithm that performs poorly on seeking... aren't those basically synonymous? – user541686 Jun 24 '14 at 23:44
  • @Mehrdad that is exactly what I want to understand, does cache optimality also imply seek optimality. I am not sure that they do. – san Jun 24 '14 at 23:47
  • 1
    I mean what I don't understand is what you mean by "seek optimality" in the first place... is it the total seek time? The only way to minimize seeking is to make the data more local; it seems impossible to make the data "more" nonlocal but "less" causal of seeking almost by definition, no? – user541686 Jun 25 '14 at 00:22
  • @Mehrdad I think an example will help, say I want to transpose a matrix stored on disk. There are cache optimal algorithms for this. What I am interested to know is whether there are algorithms that also achieve the minimal number of seeks possible according to some model, and whether these cache optimal algorithms themselves achieve that. – san Jun 25 '14 at 00:33

1 Answers1

4

Yes, I've personally written seek sensitive algorithms. File systems certainly use seek sensitive data structures (which are analogous to algorithms).

For example, NTFS (and many other filesystems) store data in B-tree format to minimize seeks and optimize for sequential reads.

Unfortunately, you're asking this question in 2014 when solid state drives and other technology are on the verge of displacing rotating media entirely. SSDs do suffer some seek penalty, but they can handle tens of thousands of seeks per second compared to rotating HDDs which probably can't handle even 100 seeks per second. This makes seeks much less of a problem than in years past.

A similar and more relevant problem is cache-line coherency on the CPU. RAM speed has not kept up with CPU speed, and multicore CPUs have made NUMA a real performance concern. To optimize performance, many algorithm implementations are pushed to use as little memory as possible, and to use nearby memory more often than far memory.

For example, a heap data structure is far more cache friendly than a tree. Performance sensitive code will choose to use heaps over trees when that's a practical choice.

This cache problem has been an overriding performance concern for the last decade at least. Almost all non-trivial programs run faster when optimized for size rather than speed, and cache misses are the cause.

Sophit
  • 491
  • 2
  • 8
  • If you could point me to some model and analysis of algorithms that address seek times that would be great. I am indeed aware of B-trees, other FS' that use them are Reiser and Btrfs. I am looking for a little more, for example, cache optimality style results – san Jun 24 '14 at 23:45
  • After having read both the question and the answer, I still don't understand what seek optimal algorithms are. – Ali Jun 25 '14 at 10:46
  • 1
    Ali, a "seek optimal algorithm" is one that attempts to localize related data so that you don't have to, for example, move a HDD read head across the disk. Seeking costs time and slows down computers. – Sophit Jun 25 '14 at 18:16