Memcached is a distributed object caching system that was originally developed to improve the performance of LiveJournal and has subsequently been used as a scaling strategy for a number of high-load sites. It serves as a large, extremely fast hash table that can be spread across many servers and accessed simultaneously from multiple processes. It’s designed to be used for almost any back-end caching need, and for high performance web applications, it’s a great complement to a database like MySQL.
In a typical environment, a web developer might employ a combination of process level caching and the built-in MySQL query caching to eke out that extra bit of performance from an application. The problem is that in-process caching is limited to the web process running on a single server. In a load-balanced configuration, each server is maintaining its own cache, limiting the efficiency and available size of the cache. Similarly, MySQL’s query cache is limited to the server that the MySQL process is running on. The query cache is also limited in that it can only cache row results. With memcached you can set up a number cache servers which can store any type of serialized object and this data can be shared by all of the loadbalanced web servers. Cool, no?
To set up a memcached server, you simple download the daemon and run it with a few parameters. From the memcached web site:
First, you start up the memcached daemon on as many spare machines as you have. The daemon has no configuration file, just a few command line options, only 3 or 4 of which you’ll likely use:
# ./memcached -d -m 2048 -l 10.0.0.40 -p 11211
This starts memcached up as a daemon, using 2GB of memory, and listening on IP 10.0.0.40, port 11211. Because a 32-bit process can only address 4GB of virtual memory (usually significantly less, depending on your operating system), if you have a 32-bit server with 4-64GB of memory using PAE you can just run multiple processes on the machine, each using 2 or 3GB of memory.
It’s about as simple as it gets. There’s no real configuration. No authentication. It’s just a gigantor hash table. Obviously, you’d set this up on a private, non-addressable network. From there, the work of querying and updating the cache is completely up to the application designer. You are afforded the basic functions of set, get, and delete. Here’s a simple example in PHP:
$memcache = new Memcache;
$value= “Data to cache”;
$memcache->set(‘thekey’, $value, 60);
echo “Caching for 60 seconds: $value <br>
$retrieved = $memcache->get(‘thekey’);
echo “Retrieved: $retrieved <br>
The PHP library takes care of the dirty work of serializing any value you pass to the cache, so you can send and retrieve arrays or even complete data objects.
In your application’s data layer, instead of immediately hitting the database, you can now query memcached first. If the item is found, there’s no need to hit the database and assemble the data object. If the key is not found, you select the relevant data from the database and store the derived object in the cache. Similarly, you update the cache whenever your data object is altered and updated in the database. Assuming your API is structured well, only a few edits need to be made to dramatically alter the scalability and performance of your application.
I’ve linked to a few resources below where you can find more information on using memcached in your application. In addition to the documentation on the memcached web site, Todd Hoff has compiled a list of articles on memcached and summarized several memcached performance techniques. It’s a pretty versatile tool. For those of you who’ve used memcached, give us a holler in the comments and share your tips and tricks.