Most efficient way to cache in a fastcgi app

Question

For fun i am writing a fastcgi app. Right now all i do is generate a GUID and display it at the top of the page then make a db query based on the url which pulls data from one of my existing sites.

I would like to attempt to cache everything on the page except for the GUID. What is a good way of doing that? I heard of but never used redis. But it appears its a server which means its in a seperate process. Perhaps an in process solution would be faster? (unless its not?)

What is a good solution for page caching? (i'm using C++)

score 2 · Accepted Answer · edited May 23 '17 at 12:03

Your implementation sounds like you need a simple key-value caching mechanism, and you could possibly use a container like std::unordered_map from C++11, or its boost cousin, boost::unordered_map. unordered_map provides a hash table implementation. If you needed even higher performance at some point, you could also look at Boost.Intrusive which provides high performance, standard library-compatible containers.

If you roll your cache with the suggestions mentioned, a second concern will be expiring cache entries, because of the possibility your cached data will grow stale. I don't know what your data is like, but you can choose to implement a caching strategy like any of these:

after a certain time/number of uses, expire a cached entry
after a certain time/number of uses, expire the entire cache (extreme)
least-recently used - there's a stack overflow question concerning this: LRU cache design

Multithreaded/concurrent access may also be a concern, though as suggested in the link above, a possibility would be to lock the cache on access rather than worry about granular locking.

Now if you're talking about scaling, and moving up to multiple processes, and distributing server processes across multiple physical machines, the simple in-process caching might not be the way to go anymore (everyone could have different copies of data at any given time, inconsistency of performance if some server has cached data but others don't).

That's where Redis/Memcached/Membase/etc. shine - they are built for scaling and for offloading work from a database. They could be beaten out by a database and in-memory cache in performance (there is latency, after all, and a host of other factors), but when it comes to scaling, they are very useful and save load from a database, and can quickly serve requests. They also come with features cache expiration (implementations differ between them).

Best of all? They're easy to use and drop in. You don't have to choose redis/memcache from the outset, as caching itself is just an optimization and you can quickly replace the caching code with using, say, an in-memory cache of your own to using redis or something else.

There are still some differences between the caching servers though - membase and memcache distribute their data, while redis has master-slave replication.

For the record: I work in a company where we use memcached servers - we have several of them in the data center with the rest of our servers each having something like 16 GB of RAM allocated completely to cache.

edit:

And for speed comparisons, I'll adapt something from a Herb Sutter presentation I watched long ago:

process in-memory -> really fast
getting data from a local process in-memory data -> still really fast
data from local disk -> depends on your I/O device, SSD can be fast, but mechanical drives are glacial
getting data from remote process (in-memory data) -> fast-ish, and your cache servers better be close
getting data from remote process (disk) -> iceberg

haha, i see your edit. Are SSDs faster then getting data from a remote process on LAN? Thats kind of surprising — , Nov 09 '11 at 15:23
@acidzombie24 - Well there are different factors. SSDs do not have to seek data like spinning drives do, which makes mechanical drives incredibly slow in comparison to memory (we're talking about <200 ns latency for memory vs 10 ms seeks for spinning drives). If the mechanical drive has to seek all over the place (blocks for the same data are spread out), you are going to be waiting forever, in computing terms. SSDs have random access and can access cells in fractions of a millisecond - not RAM fast, but faster than mechanical. — wkl, Nov 09 '11 at 15:53
And since I'm limited to comment length. Network does have a slight latency (round trip data), and yes, it could be faster than local disk access (like if the remote process is using memory to store data). I'll try to dig up the Herb Sutter presentation. Here's a thread on ServerFault in the meantime: http://serverfault.com/questions/238417/are-networks-now-faster-than-disks — wkl, Nov 09 '11 at 15:55

Most efficient way to cache in a fastcgi app

1 Answers1

Linked