Sunday, 18 March 2018

Domain Driven Design Lazy Loading

Lazy loading sounds straightforward, but DDD manages to complicate things. DDD states that entities should not hold references to services or repositories. That is really not debatable, and people that have even the most basic understanding of DDD will follow this rule without questioning it. So if that is the case, then how do you lazy load, because the entity has to call something to get the data. This is a discussion I had with my good friend Daniel Mouritsen and it seems that a friend of his even stalked poor Eric Evans to get an answer. Now I can't promise that Eric Evans will be happy with my solution, so it's upto you to decide. 


An example

Let's take a very simple example. We have a User entity and a Wallet entity. The users can have multiple wallets. A user entity can look for an user by id, but we don't want to load the wallets eagerly, we want to load the wallets only if something actually tries to access them.



Functional to the rescue

While we're not allowed to hold references to other services or repositories in our entities, how about a Java 8's Supplier? This will not create a direct dependency between the User and the Wallet repository, the User class has no idea that the Wallet Repository even exists. So the User changes slightly:


What that means is that we can set the wallets as a lambda expression and most importantly, that lambda expression is not called unless we are accessing the wallets. And here's what we need to do in the repository:

Pretty straightforward, right? The full example is available here.
If you liked this, don't forget to share!




Thursday, 21 September 2017

Kernel tuning in Kubernetes

Kubernetes is a great piece of technology, it trivialises things that 5 years ago required solid ops knowledge and makes DevOps attainable for the masses. More and more people start jumping on the Kubernetes bandwagon nowadays, however sooner or later people realise that a pod is not exactly a small VM magically managed by Kubernetes.

Kernel tuning

So you've deployed your app in Kubernetes, everything works great, you just need to scale now. What do you do to scale? The naive answer is: add more pods. Sure, you can do that, but pods will quickly hit kernel limits. One particular kernel parameter I have in mind is net.core.somaxconn. This parameter represents the maximum number of connections that can be queued for acceptance. The default value on Linux is 128, which is rather low:
root@x:/# sysctl -a | grep "net.core.somaxconn"
net.core.somaxconn = 128
You might get away with not increasing it, but I believe it's wasteful to create new pods unless there is a cpu or memory need.

In order to update a sysctl parameter, in a normal Linux VM, the following command will just work:
root@x:/# sysctl -w net.core.somaxconn=10000    
However, try it in a pod and you get this:
root@x:/# sysctl -w net.core.somaxconn=10000  
sysctl: setting key "net.core.somaxconn": Read-only file system
Now you start realising that this is not your standard Linux VM where you can do whatever you want, this is Kubernetes' turf and you have to play by its rules.

Docker image baking

At this point you're probably wondering why am I not just baking the sysctl parameters into the docker image. The docker image could be as simple as this:
FROM alpine
#Increase the number of connections
RUN echo "net.core.somaxconn=10000" >> /etc/sysctl.conf
Well, the bad news is that it doesn't work. As soon as you deploy the app and connect to the pod, the kernel parameter is still 128. Variations on the docker image baking theme can be attempted, however I could not successfully do it and I have a strong feeling that it's not doable this way.

Kubernetes sysctl

There is documentation around sysctl in Kubernetes here. So Kubernetes acknowledges the fact that kernel tuning is required sometimes and provides explicit support for that. Then it should be as easy as following the documentation, right? Not quite, the documentation is a bit vague and it didn't quite work for me. I'm not creating pods directly as described in the documentation, I am using deployments.

After a bit of research, I did find a way to do it, via init containers. Init containers are specialised Containers that run before app Containers and can contain utilities or setup scripts not present in an app image. Let's see an example:



In order for this to work you will need to create a custom ubuntu image that never terminates. It's a bit of a hack, I know, but the point is to keep the pod running so that we can connect and inspect that the sysctl changes were successfully applied. This is the Dockerfile:

In order to test this we have to build the image first:
docker build -t ubuntu-test-image .
I am using minikube locally, for testing, so I'm creating the deployment like this:
kubectl apply -f ubuntu-deployment.yml
In a Kubernetes cluster, the config has to be provided with the command.
Once the deployment was created let's log into the pod and inspect the changes. First we need to find the pod id:
bogdan@x:/$ kubectl get pods | grep ubuntu-test
ubuntu-test-1420464772-v0vr1     1/1       Running            0          11m
Once we have the pod id, we can log into the pod like this:
 kubectl exec -it ubuntu-test-1420464772-v0vr1   -- /bin/bash
Then check the sysctl parameters:
:/#  sysctl -a | grep "net.core.somaxconn" 
net.core.somaxconn = 10000 

:/# sysctl -a | grep "net.ipv4.ip_local_port_range" 
net.ipv4.ip_local_port_range = 1024     65535
 As you can see the changes were applied. 

Sunday, 24 July 2016

Testing Apache Camel routes

Running Apache Camel on top of Spring is quite popular nowadays and there is a multitude of resources on how to create the routes. However, sooner or later you will want to test the routes and this is where people usually hit a brick wall, because the documentation is a bit confusing. If you're still trying to find a good way to test your routes, then read on.

A simple route

We are using 3 queues: seda:groceries, seda:fruits and seda:vegetables. In case you're wondering what SEDA is, it's just an in-memory queue provided by Camel and it can be replaced by anything else like JMS or RabbitMQ, the principle stays the same.
The type of objects pushed into the groceries queue is checked via Camel's choice() construct and the fruits are redirected into the fruits queue, while the vegetables are redirected into the vegetables queue. The fruits queue is processed by FruitProcessor and the vegetable queue is processed by VegetableProcessor.



The test setup

In order to test the routes, we need to start Camel, which is not possible using the production configuration, considering the fact that Camel will connect to various third party services. Also, we need to make sure that we don't run the real processors, because we only want to test the routes.

This is the test setup, which is actually the complicated part of all this:



A few essential points for the test setup:
  1. We are using CamelSpringJUnit4ClassRunner to run the tests
  2. The test has its own configuration, defined in the ContextConfig inner class, which extends SingleRouteCamelConfiguration
  3. The route() method in ContextConfig() defines the route builder class that we are testing
  4. All the processors are defined as beans and they are mocked by using Camel's MockEndpoint. That allows us to replace the actual processors with mocks and make assertions on how the messages are travelling through the routes.
  5. The mock processors are autowired into the test class so that assertions can be made against them

The tests

After the setup is complete, the actual tests are quite straightforward:
  1. When I push a Fruit object into the groceries queue, I would like the FruitProcessor to process the message
  2. When I push a Vegetable object into the groceries queue, I would like the VegetableProcessor to process the message
This is the actual code:
There are many ways to go about testing the routes, but the setup and test strategy remain the same. I hope you found this useful, if you did then please share. The maven project is available here.

Share on Twitter | Share on Facebook | Share on Google+

Thursday, 12 March 2015

Reduce your git repository size in minutes

It is a fact that your git repository accumulates a lot of history. Even though git was not built for binary files, people do store them in repositories and that contributes to the growth. At a certain point you might be removing binary files and looking back at the history of an image is not something that you do every day. So why not remove all the massive blobs from history? I know, it sounds like you need to rewrite the history and that is dangerous isn't it? Not quite, with a nice tool called bfg.

Right, let's start:

1. Download bfg or install it via brew, yum, etc.

2. Create a bare clone of your git repository:
git clone --mirror git://something.com/big-repo.git

3. Create a backup of the repository(just in case)
cp -r big-repo.git big-repo.git_bak

4. Run this:
bfg -b 100K big-repo.git

This will remove all files over 100K, but don't worry, HEAD is protected. There are many other options(including protecting other branches), have a look at their documentation or just run bfg with no arguments to see the options.

5.  Run git gc to actually remove the files

cd big-repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive

6. [Optional] Create a new repo where you push the changes. I like to push the changes into a new repo to be 100% sure that the repository is in a good state. Before pushing, change the url for remote "origin" inside big-repo.git/config.

7. Push the changes:
git push

8. Done. Enjoy your lean repository!

Monday, 30 June 2014

Enterprise integration - PHP and Java over Thrift

Sometimes integration between different platforms is a must. In an ideal world you would roll the same platform across all your infrastructure, but in reality that is not always possible. The first thing that springs in mind when you have different platforms is web services such as REST, SOAP or XML-RPC(if you lived in a cave for the past 10 years - no offense to those who still run it for "legacy" reasons). It's natural to think of these solutions, since most people noways will publish their APIs over HTTP. Communicating over HTTP is slow and it's perfectly fine for third-party integrations in most cases, because there are no great expectations performance wise. But what if the integration is done internally and two pieces of your infrastructure, that run on top of different platforms, need to be integrated? Is it acceptable that for every HTTP request to your application you make another HTTP request? If it is, then please stop reading this, I am wasting your time. If it's not then read on.
There is a better way for integration and that is what giants such as Google or Facebook are doing. Google has Protocol Buffers and Facebook has Apache Thrift. Let's have a quick look at both.

What Are Protocol Buffers?

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.

What is Apache Thrift?

The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

Java+PHP, Protocol Buffers or Thrift?

The answer is obvious, PHP is only supported by Thrift. If you look at the list of languages supported by Thift, it will be the natural choice for many. Facebook have developed it internally, it works great for them at a massive scale, which means it should be brilliant for you also.
In order to be able to use Thrift, you must download it and install it. Please follow the instructions on the official site. Please note that you don't need Thrift installed to run my example, just clone the repository from Github: git clone https://github.com/bogdanalbei/KrunchyKreme.git

Java Server, PHP client

Let's say that we want to consume the services of a Java application from PHP. The reverse is also possible and quite easy to accomplish, as you will see. The Java application is a doughnut vending machine and the client has the possibility of having a look at what's available and ordering.
The most essential part in understanding Thrift is understanding it's Interface Definition Language(IDL). Based on the IDL file(s), Thrift will compile your server and you client in any of the supported languages. Let's have a look at our IDL file:

The above defines the KrunchyKreme service and two methods: getMenu and order. getMenu takes no parameters and will return a list of Doughnut(the definition of Doughnut is begins with "struct Doughnut"). The "order" method allows the client to place an order for a specific doughnutId and a quantity. Please note that the data types are platform independent and they will get translated into the appropriate types depending on the compilation language. You can read about Thrift IDL files here later if you wish.

Now the fun bit, let's compile our Java server and PHP client:

The Java files will be generated into the gen-java folder and the PHP files will be generated into the gen-php folder. In my sample project I have conveniently created a generate-java and generate-php script that also copies the files into the right folder structure for the Java and PHP projects.
You can generate your clients and servers in any of the supported languages, which is pretty good if you ask me. 

 

The Java server

My sample project contains the java server. In uses maven and you can build it by running the build script bin/build.sh that just executes "mvn package". This is the server:

All this does is starting a single threaded server that handles requests by using the KrunchyKremeHandler class:

If you are wondering where KrunchyKreme.Iface came from, the answer is easy: it is generated by Thrift. So you will get an interface for your server, all you need to do now is write the actual code.  Easy, right?

Once the server is built, you can start it by running ./bin/server.sh in the server's folder. It will start listening on port 9090 and will output debug information when clients connect and call it's services.

The PHP client

You can find the sample client here. The project is based on composer, so you will have to install the dependencies by running php componser.phar install.

The getClient() method initializes the client. Once that is done, getMenu() is called and the three successive orders are placed by calling order(..). Please note the KrunchyKremeClient class, which was generated by Thrift. This gives us all the methods available to call on the server, all we have to do is use it.
You can run the client with php client.php in the client folder. This is the output:

As expected, we display the menu and then output the results of the orders. Please note the response time: 3ms to connect, 3ms to get the menu and 1ms for each order. I am running this on a basic DigitalOcean machine with 1GB or RAM, 1 processor and SSD.  Fair enough, this is a simple example, but try this over HTTP. I don't have numbers for this, but I would expect it to be about ten times worse for Restful web service calls. Not to talk about the resources used, a web server that serves web services will be much more expensive than a simple standalone server. This is why Google and Facebook don't do much Restful internally but Protocol Buffers and Thrift. 

In conclusion, is the complexity worth it? It really depends on your needs, but I believe we will see Thrift more and more.

Friday, 31 May 2013

Updating InfusionSoft Custom Fields

This is a quick post, and I'm writing it mainly because I wasted a lot of time trying to programatically update custom fields in Infusionsoft. So here goes, if you need to update a custom field you need to know the id of the field and the new value:

 updateCustomField($fieldId, $fieldValues)  

You can find the id of the field by going to Admin -> Settings -> Set up custom fields for -> Go and when you hover the field you are interested into you will see the id.
Getting the value is more tricky. What works on any type of field is setting a value manually and then querying DataFormField for the value like this:
  
 $returnFields = array('DataType', 'Id', 'FormId', 'GroupId', 'Name', 'Label', 'DefaultValue', 'Values', 'ListRows');   
 $query = array('Id' => custom_field_id);   
 $res = $sdk->dsQuery("DataFormField", 10, 0, $query, $returnFields);   
 var_dump($res);   
In the result above, have a look at the "Values" field, this is how it should look like.
So to conclude, if you need to update a listbox custom field, this will do it: 

 $values = array(  
     'Values' => "\naaa\nbbb\nddd"  
 );  
 $result = $sdk->updateCustomField(custom_field_id, $values);  

Monday, 19 November 2012

Couchbase smart clients

While doing some research on replacing Memcached with Couchbase for the company I work for, I came across the term "Couchbase smart client". The problem is I couldn't find a decent explanation(in my opinion) for what exactly it does. As you will find out next, the smart client is maybe the most important aspect when it comes to connecting to a Couchbase cluster.
Let's have a look first at what the Couchbase documentation says about smart clients:
When using a smart client, the client library provides an interface to the cluster, and performs server selection directly via the vBucket mechanism. The clients communicate with the cluster using a custom Couchbase protocol which enables the sharing of the vBucket map, and the selection within the client of the required vBucket when obtaining and storing information


I couldn't figure out how that is affecting me as a client of the cluster, so I decided to have a look at the low levels calls that a simple client is doing in order to connect to the cluster and write a key. Below is the source code(PHP):

$servers = array('server1.something.com:8091', 'server2.something.com:8091');
foreach ($servers as $server) {
        $couch = @(new Couchbase( $server )); // uses the default bucket
        if($couch->getResultCode() === Couchbase::SUCCESS) {
                break;
        }
}

$couch->set('kk', 1);

echo "Done";
?>

Let's run strace  for this script:
strace php ./bin/Temp/couchbase_connect.php   2> /tmp/couch_connect

The output is rather verbose, but we are only interested in the following bits:

  • #1 connect(6, {sa_family=AF_INET, sin_port=htons(8091), sin_addr=inet_addr("10.200.100.10")}, 16) = 0
  • #2 sendto(6, "GET /pools/default/bucketsStream"..., 56, 0, NULL, 0) = 56
  • #3 recvfrom(6, "HTTP/1.1 200 OK\r\nTransfer-Encodi"..., 2048, 0, NULL, NULL) = 225
  • #4 connect(7, {sa_family=AF_INET, sin_port=htons(11210), sin_addr=inet_addr("10.200.100.20")}, 16) = -1 EINPROGRESS (Operation now in progress)

The above is a massive simplification of what is happening, but it's enough to give us a clue of what the client is doing:

  • at line 1 the client is connecting to the first server in the pool on port 8091
  • at line 2 the client is requesting information about the servers in the cluster
  • at line 3 the client starts receiving that information
  • at line 4 the client is initiating a connection to the server in the pool that is responsible for the key that we want to write, port 11210

To make this short and sweet, the client is connecting to one of the servers in the pool and by receiving information about the cluster, it is able to decide which of the servers is handling the keys it needs. So what makes the client "smart" is the fact that it has knowledge about the cluster and it makes decisions based on the current state of the cluster.

You can start seeing the advantages of such a smart client when you think about not so smart clients like memcached, where you would have to implement all the logic about the state of the cluster on the client side, which is no trivial task by most people's standards.