Monday 19 November 2012

Couchbase smart clients

While doing some research on replacing Memcached with Couchbase for the company I work for, I came across the term "Couchbase smart client". The problem is I couldn't find a decent explanation(in my opinion) for what exactly it does. As you will find out next, the smart client is maybe the most important aspect when it comes to connecting to a Couchbase cluster.
Let's have a look first at what the Couchbase documentation says about smart clients:
When using a smart client, the client library provides an interface to the cluster, and performs server selection directly via the vBucket mechanism. The clients communicate with the cluster using a custom Couchbase protocol which enables the sharing of the vBucket map, and the selection within the client of the required vBucket when obtaining and storing information


I couldn't figure out how that is affecting me as a client of the cluster, so I decided to have a look at the low levels calls that a simple client is doing in order to connect to the cluster and write a key. Below is the source code(PHP):

$servers = array('server1.something.com:8091', 'server2.something.com:8091');
foreach ($servers as $server) {
        $couch = @(new Couchbase( $server )); // uses the default bucket
        if($couch->getResultCode() === Couchbase::SUCCESS) {
                break;
        }
}

$couch->set('kk', 1);

echo "Done";
?>

Let's run strace  for this script:
strace php ./bin/Temp/couchbase_connect.php   2> /tmp/couch_connect

The output is rather verbose, but we are only interested in the following bits:

  • #1 connect(6, {sa_family=AF_INET, sin_port=htons(8091), sin_addr=inet_addr("10.200.100.10")}, 16) = 0
  • #2 sendto(6, "GET /pools/default/bucketsStream"..., 56, 0, NULL, 0) = 56
  • #3 recvfrom(6, "HTTP/1.1 200 OK\r\nTransfer-Encodi"..., 2048, 0, NULL, NULL) = 225
  • #4 connect(7, {sa_family=AF_INET, sin_port=htons(11210), sin_addr=inet_addr("10.200.100.20")}, 16) = -1 EINPROGRESS (Operation now in progress)

The above is a massive simplification of what is happening, but it's enough to give us a clue of what the client is doing:

  • at line 1 the client is connecting to the first server in the pool on port 8091
  • at line 2 the client is requesting information about the servers in the cluster
  • at line 3 the client starts receiving that information
  • at line 4 the client is initiating a connection to the server in the pool that is responsible for the key that we want to write, port 11210

To make this short and sweet, the client is connecting to one of the servers in the pool and by receiving information about the cluster, it is able to decide which of the servers is handling the keys it needs. So what makes the client "smart" is the fact that it has knowledge about the cluster and it makes decisions based on the current state of the cluster.

You can start seeing the advantages of such a smart client when you think about not so smart clients like memcached, where you would have to implement all the logic about the state of the cluster on the client side, which is no trivial task by most people's standards.