Thursday, 12 March 2015

Reduce your git repository size in minutes

It is a fact that your git repository accumulates a lot of history. Even though git was not built for binary files, people do store them in repositories and that contributes to the growth. At a certain point you might be removing binary files and looking back at the history of an image is not something that you do every day. So why not remove all the massive blobs from history? I know, it sounds like you need to rewrite the history and that is dangerous isn't it? Not quite, with a nice tool called bfg.

Right, let's start:

1. Download bfg or install it via brew, yum, etc.

2. Create a bare clone of your git repository:
git clone --mirror git://

3. Create a backup of the repository(just in case)
cp -r big-repo.git big-repo.git_bak

4. Run this:
bfg -b 100K big-repo.git

This will remove all files over 100K, but don't worry, HEAD is protected. There are many other options(including protecting other branches), have a look at their documentation or just run bfg with no arguments to see the options.

5.  Run git gc to actually remove the files

cd big-repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive

6. [Optional] Create a new repo where you push the changes. I like to push the changes into a new repo to be 100% sure that the repository is in a good state. Before pushing, change the url for remote "origin" inside big-repo.git/config.

7. Push the changes:
git push

8. Done. Enjoy your lean repository!

Monday, 30 June 2014

Enterprise integration - PHP and Java over Thrift

Sometimes integration between different platforms is a must. In an ideal world you would roll the same platform across all your infrastructure, but in reality that is not always possible. The first thing that springs in mind when you have different platforms is web services such as REST, SOAP or XML-RPC(if you lived in a cave for the past 10 years - no offense to those who still run it for "legacy" reasons). It's natural to think of these solutions, since most people noways will publish their APIs over HTTP. Communicating over HTTP is slow and it's perfectly fine for third-party integrations in most cases, because there are no great expectations performance wise. But what if the integration is done internally and two pieces of your infrastructure, that run on top of different platforms, need to be integrated? Is it acceptable that for every HTTP request to your application you make another HTTP request? If it is, then please stop reading this, I am wasting your time. If it's not then read on.
There is a better way for integration and that is what giants such as Google or Facebook are doing. Google has Protocol Buffers and Facebook has Apache Thrift. Let's have a quick look at both.

What Are Protocol Buffers?

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.

What is Apache Thrift?

The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

Java+PHP, Protocol Buffers or Thrift?

The answer is obvious, PHP is only supported by Thrift. If you look at the list of languages supported by Thift, it will be the natural choice for many. Facebook have developed it internally, it works great for them at a massive scale, which means it should be brilliant for you also.
In order to be able to use Thrift, you must download it and install it. Please follow the instructions on the official site. Please note that you don't need Thrift installed to run my example, just clone the repository from Github: git clone

Java Server, PHP client

Let's say that we want to consume the services of a Java application from PHP. The reverse is also possible and quite easy to accomplish, as you will see. The Java application is a doughnut vending machine and the client has the possibility of having a look at what's available and ordering.
The most essential part in understanding Thrift is understanding it's Interface Definition Language(IDL). Based on the IDL file(s), Thrift will compile your server and you client in any of the supported languages. Let's have a look at our IDL file:

The above defines the KrunchyKreme service and two methods: getMenu and order. getMenu takes no parameters and will return a list of Doughnut(the definition of Doughnut is begins with "struct Doughnut"). The "order" method allows the client to place an order for a specific doughnutId and a quantity. Please note that the data types are platform independent and they will get translated into the appropriate types depending on the compilation language. You can read about Thrift IDL files here later if you wish.

Now the fun bit, let's compile our Java server and PHP client:

The Java files will be generated into the gen-java folder and the PHP files will be generated into the gen-php folder. In my sample project I have conveniently created a generate-java and generate-php script that also copies the files into the right folder structure for the Java and PHP projects.
You can generate your clients and servers in any of the supported languages, which is pretty good if you ask me. 


The Java server

My sample project contains the java server. In uses maven and you can build it by running the build script bin/ that just executes "mvn package". This is the server:

All this does is starting a single threaded server that handles requests by using the KrunchyKremeHandler class:

If you are wondering where KrunchyKreme.Iface came from, the answer is easy: it is generated by Thrift. So you will get an interface for your server, all you need to do now is write the actual code.  Easy, right?

Once the server is built, you can start it by running ./bin/ in the server's folder. It will start listening on port 9090 and will output debug information when clients connect and call it's services.

The PHP client

You can find the sample client here. The project is based on composer, so you will have to install the dependencies by running php componser.phar install.

The getClient() method initializes the client. Once that is done, getMenu() is called and the three successive orders are placed by calling order(..). Please note the KrunchyKremeClient class, which was generated by Thrift. This gives us all the methods available to call on the server, all we have to do is use it.
You can run the client with php client.php in the client folder. This is the output:

As expected, we display the menu and then output the results of the orders. Please note the response time: 3ms to connect, 3ms to get the menu and 1ms for each order. I am running this on a basic DigitalOcean machine with 1GB or RAM, 1 processor and SSD.  Fair enough, this is a simple example, but try this over HTTP. I don't have numbers for this, but I would expect it to be about ten times worse for Restful web service calls. Not to talk about the resources used, a web server that serves web services will be much more expensive than a simple standalone server. This is why Google and Facebook don't do much Restful internally but Protocol Buffers and Thrift. 

In conclusion, is the complexity worth it? It really depends on your needs, but I believe we will see Thrift more and more.

Monday, 24 March 2014

Is Facebook's Hack the future of PHP?

Recently Facebook announced that their PHP-based language Hack is publicly available. Everyone knows that Facebook is written in PHP and that is undoubtedly the largest PHP application in the world. What many don't know is that most of their code has been ported to Hack. Also they don't run the official PHP implementation, they run HHVM. With that in mind, could we really say that Facebook runs on PHP? Let's analyse a bit what Hack means to the PHP community, is Hack really the evolution of PHP or just a twist needed to satisfy Facebook's needs.

Here's the official definition of Hack:
Hack is a programming language for HHVM that interoperates seamlessly with PHP. Hack reconciles the fast development cycle of PHP with the discipline provided by static typing, while adding many features commonly found in other modern programming languages.

So Hack is statistically typed. This is probably the biggest and most significant difference between PHP and Hack. Many will swear on dynamic typing, including Rasmus Lerdorf, the creator of PHP, which will tell you that everything that comes over HTTP into your application is a string. True, but the reality is that the majority of functions or methods in PHP applications with receive and return the same types over and over again. Not to mention PHP's type hinting, which highlighted a strong need for constraints in PHP. What do we lose in Hack? A certain amount of flexibility. What do we gain? Much lower memory consumption, less CPU cycles. You decide what you need!

Let's have a look at other notable features of Hack.
  • Generics allow classes and methods to be parameterized (i.e., a type associated when a class is instantiated or a method is called) in the same vein as statically type languages like C# and Java):
      protected T $data;
      public function __construct(T $data) {
        $this->data = $data;
      public function getData(): T {
        return $this->data;
If you worked with Java or C# before, you know that they are essential. If you worked with Java pre 1.5 you know even better how much easier the life of Java programmers was made with the introduction of generics. Generics definitely make sense in a statically typed language and I would have been surprised if they wouldn't be available in Hack. Do we need them in PHP? Not really, because they wouldn't make much sense in a dynamically typed language.

  • Nullable Types are supported by Hack through use of the ? operator. This introduces a safer way to deal with nulls and is very useful for primitive types that don’t generally allow null as one of their values, such as bool and int (using ?bool and ?int respectively). The operator can be used on any type or class.

This is a nice to have feature. In PHP we just have to validate everything since you can't enforce much.
  • Collections enhance the experience of working with PHP arrays, by providing first class, built-in parameterized types such as Vector (an ordered, index-based list), Map (an ordered dictionary), Set (a list of unique values), and Pair (an index-based collection of exactly two elements).
I'm really happy that additional data structures were introduced. PHP's arrays are used everywhere since there aren't many alternatives. PHP arrays are basically hashmaps which is quite wasteful if you just want to store a simple list of integers for instance. One data structure for everything is not ideal and we all know how memory hungry PHP can get if you have non-trivial structures.

  • Lambdas offer similar functionality to PHP closures, but they capture variables from the enclosing function body implicitly and are less verbose:
  • function foo (): (function (string): string) {
      $x = 'bar';
      return $y ==> $x . $y;
    function test (): void {
      $fn = foo();
      echo $fn('baz'); // barbaz
Most definitely useful, since the PHP closures syntax and the "use" statement are quite cumbersome. The whole new syntax smells a bit like Scala if you ask me, but that's cool since Scala is supposed to be the evolution of Java.
  • Shapes are a implementation of the parameter object pattern. :
    type MyShape shape('id1' => <type>, 'id2' => <type>);
    function foo(MyShape $x): void {}
Very useful I must say, since I don't have to create the entire boilerplate myself whenever I need to use the parameter object pattern.

  • Type aliasing allows existing types to be redefined as new types:
  • type MyInt int;
    function foo(MyInt $mi): void {}
    newtype Point = (intint);
    function foo(Point $x): void {}
Do I smell Scala influence again? Yes, type aliasing makes sense for Hack, but you couldn't call this PHP code, even with the $x giveaway.

As you've seen the major shift done by Hack is to be statically typed. Because of that, most of it's features follow. So lots of cool features from Hack. The question is, if Hack is so good, why isn't everyone jumping on Hack? I'm afraid there are quite a few reasons for that:
  • Hack runs on top of HHVM. How many hosting companies offer an HHVM environment? None would probably be about right. So if you want Hack you will have to run your own HHVM setup
  • HHVM is not easy to install. There aren't packages for many platforms at the moment, so you will probably need to build HHVM. That's not easy, it has lots and lots of dependencies and it needs quite recent versions of certain libraries, which is a problem on a lot of systems. In time this won't be a problem, people will start building proper packages and we'll be able to use them.
  • HHVM and Hack are not 100% PHP compatible.. yet. That means you will have to tune your applications slightly to make them run on HHVM. It's not a big task in most cases, but don't expect existing applications to just run on HHVM.
  • HHVM only supports a limited list of extensions.  The PHP extensions cannot be used under HHVM. The list of extensions is not very comprehensive. If you use something else then you will have to either find alternatives, write the extension yourself, or wait for the extension to become available. Probably more and more extension will be available in the future, but don't hold your breath for your favourite extension.
  • Hack and HHVM are still young. Even though they've been used by Facebook for some time, it's only now that they had the confidence to make it available to everyone. It will take some time for people to be confident enough to use them.
  • The HHVM documentation is not great. Trying to figure out how to do stuff out of the github's wiki can prove challenging. I'm sure this will get better in time.
  • The Hack and HHVM community are almost inexistent at the moment. If you have a problem it will not be as easy as searching for PHP problems where someone probably already had the same issue and the answers are just available on stackoverflow. 
In conclusion, Hack+HHVM have great benefits, but there are serious drawbacks which will stop people adopting them straight away. People that really need HHVM and Hack will make the efforts to adapt their applications in order to benefit from their advantages. I personally would go in production with HHVM without looking back, but the majority of medium/small PHP applications won't rush into migrating. Hack won't replace PHP any time soon, best case scenario they will coexist and Hack's usage will grow steadily.
So to answer the initial question: is Hack the futures of PHP? I would say yes, but I don't see Hack being as popular as PHP any time soon.

Friday, 31 May 2013

Updating InfusionSoft Custom Fields

This is a quick post, and I'm writing it mainly because I wasted a lot of time trying to programatically update custom fields in Infusionsoft. So here goes, if you need to update a custom field you need to know the id of the field and the new value:

 updateCustomField($fieldId, $fieldValues)  

You can find the id of the field by going to Admin -> Settings -> Set up custom fields for -> Go and when you hover the field you are interested into you will see the id.
Getting the value is more tricky. What works on any type of field is setting a value manually and then querying DataFormField for the value like this:
 $returnFields = array('DataType', 'Id', 'FormId', 'GroupId', 'Name', 'Label', 'DefaultValue', 'Values', 'ListRows');   
 $query = array('Id' => custom_field_id);   
 $res = $sdk->dsQuery("DataFormField", 10, 0, $query, $returnFields);   
In the result above, have a look at the "Values" field, this is how it should look like.
So to conclude, if you need to update a listbox custom field, this will do it: 

 $values = array(  
     'Values' => "\naaa\nbbb\nddd"  
 $result = $sdk->updateCustomField(custom_field_id, $values);  

Monday, 19 November 2012

Couchbase smart clients

While doing some research on replacing Memcached with Couchbase for the company I work for, I came across the term "Couchbase smart client". The problem is I couldn't find a decent explanation(in my opinion) for what exactly it does. As you will find out next, the smart client is maybe the most important aspect when it comes to connecting to a Couchbase cluster.
Let's have a look first at what the Couchbase documentation says about smart clients:
When using a smart client, the client library provides an interface to the cluster, and performs server selection directly via the vBucket mechanism. The clients communicate with the cluster using a custom Couchbase protocol which enables the sharing of the vBucket map, and the selection within the client of the required vBucket when obtaining and storing information

I couldn't figure out how that is affecting me as a client of the cluster, so I decided to have a look at the low levels calls that a simple client is doing in order to connect to the cluster and write a key. Below is the source code(PHP):

$servers = array('', '');
foreach ($servers as $server) {
        $couch = @(new Couchbase( $server )); // uses the default bucket
        if($couch->getResultCode() === Couchbase::SUCCESS) {

$couch->set('kk', 1);

echo "Done";

Let's run strace  for this script:
strace php ./bin/Temp/couchbase_connect.php   2> /tmp/couch_connect

The output is rather verbose, but we are only interested in the following bits:

  • #1 connect(6, {sa_family=AF_INET, sin_port=htons(8091), sin_addr=inet_addr("")}, 16) = 0
  • #2 sendto(6, "GET /pools/default/bucketsStream"..., 56, 0, NULL, 0) = 56
  • #3 recvfrom(6, "HTTP/1.1 200 OK\r\nTransfer-Encodi"..., 2048, 0, NULL, NULL) = 225
  • #4 connect(7, {sa_family=AF_INET, sin_port=htons(11210), sin_addr=inet_addr("")}, 16) = -1 EINPROGRESS (Operation now in progress)

The above is a massive simplification of what is happening, but it's enough to give us a clue of what the client is doing:

  • at line 1 the client is connecting to the first server in the pool on port 8091
  • at line 2 the client is requesting information about the servers in the cluster
  • at line 3 the client starts receiving that information
  • at line 4 the client is initiating a connection to the server in the pool that is responsible for the key that we want to write, port 11210

To make this short and sweet, the client is connecting to one of the servers in the pool and by receiving information about the cluster, it is able to decide which of the servers is handling the keys it needs. So what makes the client "smart" is the fact that it has knowledge about the cluster and it makes decisions based on the current state of the cluster.

You can start seeing the advantages of such a smart client when you think about not so smart clients like memcached, where you would have to implement all the logic about the state of the cluster on the client side, which is no trivial task by most people's standards.

Saturday, 19 June 2010

PHP remote debugging with Xdebug and Eclipse PDT

Debugging is an invaluable part of software development. I find it very useful in a variety of situations, for instance when I want to understand how a routine works or I need to get rid of a bug that is not exactly easy to fix just by reading the code.

There are several ways to perform debugging in PHP:
  • The most straightforward technique is to use print_r() and var_dump(). This will alter the output, it's quick but very dirty. If you're using this there's nothing to be ashemed of, everyone is doing it.
  • Logging into files/database tables at specific points in the code. This is cleaner than the previous method, but it requires additional effort and usually polutes the code with logging routines. Also this is not exactly debugging, it's logging and analisyng the logs.
  • Using proper debugging tools like Xdebug or the Zend Debugger, integrated into your PHP IDE. This is the clean way to do it, it provides a much better insight into the source code, as you can run it interactively, step by step.
My main goal in this post is to show you how to set your debugging environment with Eclipse PDT and Xdebug. If you're not already using it, get your Eclipse PDT from and install it. Next you will have to get and install xdebug on the machine where PHP runs(it can be the same machine or some remote machine). You should be able to get it through PHP PECL with the following command:

pecl install xdebug

If the above does not work, check the Xdebug installation instructions at .

Once the xdebug extension was installed, you will have to add the extension to php.ini. Add the following lines to php.ini:


On Windows+PHP 5.2.14 I had to replace zend_extension with zend_extension_ts:

Be extra careful with xdebug.remote_host, this is the host where you develop and run your Eclipse, and PHP will try and connect to Eclipse when debugging is enabled. Also make sure that the zend_extension part was not added automatically by the installation, if it was don't add it again.
If there is any mention of the Zend debugger in you php.ini file, you will have to comment that. Restart Apache or whatever web server you're using and make sure the Xdebug installation was correct by running a simple PHP script that contains phpinfo() and searching for "xdebug".

Now the tricky part, Eclipse has to be configured to accept debugging sessions from XDebug. Follow the steps below:
  1. Open your project in Eclipse PDT
  2. In the main menu select Project->Properties
  3. On the left side of the window select "PHP Debug" and then click on "Configure Workspace Settings"
  4. On the "PHP Debugger" dropdown select Xdebug and click "Apply"
  5. Click "Configure" to the right of Xdebug in the same window.
  6. Select Xdebug and click "Configure".
  7. On the "Accept remote session(JIT)" select "any" and click "OK". This is extremely important and this is where most people get stuck.
That's it, Eclipse is now configured, now all we need is to be able to be in control of our debugging sessions. For this we will need to install a Firefox extension called "easy Xdebug"(yes Firefox, you're not developing PHP in IE are you?).

The extension can be installed from . If the link does not work just google "firefox xdebug". Install the extension and restart Firefox. After that you will notice a little green bug on the bottom-right of Firefox and if you hover it it says "Start xdebug session".

As a side note, you might have used Zend IDEs where the debug process starts from the IDE. In Eclipse PDT the process is reversed: you start from the page that you want to debug and PHP will connect to Eclipse in order to establish a debug session. That is why we have installed the firefox extension, because the debug starts from the browser.

Now open the page that you want to debug, on the server where you have just configured PHP with the XDebug extension of course. Click on the green bug I just mentioned to enable debugging and then reload the page. After this you will have to go to Eclipse and see that a new window has just popped up, asking you to "Select the local resource that matches the following server path". In a simple setup you will have just a single option, select the PHP file in that window and click "OK". Eclipse will ask you if you want to change to "PHP Debug perspective" and obviously you have to say "Yes". Optionally you can also check "Remember my decision". After this you should be in the debugging perspective, with Eclipse stopped on the first line of your code, meaning that you can now step through your code.

As a simple guideline you can use the following keys:
  • F5(Step Into) - steps into everything including function or method calls
  • F6(Step Over) - walks through but does not step into function or method calls
  • F8(Resume) - runs until the first breakpoint or end of the program
Breakpoints can be placed by double-clicking on the right of the line where you need the breakpoint. Try and play with the above keys to get a better idea of how they work.

That's it, I'm sure you'll realise that you can't live without debugging once you start using it.

Wednesday, 20 May 2009

Quickstart web services with SOAP and Zend Framework

Web services are software systems designed to support interoperable machine-to-machine interaction over a network. Nowadays if you want to connect external systems, you probably want or have to use web services. What I will discuss here is how to get your own SOAP web service up in minutes.

SOAP(Simple Object Access Protocol ) is probably the most used web service protocol today. It relies on XML as its message format, and it uses HTTP for message transmission. The SOAP server uses WSDL(Web Services Description Language ) to describe its services to external clients. WSDL is simply an XML-based language that provides a model for describing Web services.

Back in the old days you had to know a lot about SOAP and WSDL create a web service. Have a look at to see what I mean. Definitely not very good looking. Luckily Zend Framework has a nice component, Zend_Soap, that handles all the SOAP hard work you would be supposed to do.

So without further ado, here's the code(discussing a Zend Framework component, the code presented here uses the Zend MVC, but you can use it without the Zend MVC):

This is the source code for the controller:
require_once realpath(APPLICATION_PATH .

class SoapController extends Zend_Controller_Action
//change this to your WSDL URI!

public function indexAction()

if(isset($_GET['wsdl'])) {
//return the WSDL
} else {
//handle SOAP request

private function hadleWSDL() {
$autodiscover = new Zend_Soap_AutoDiscover();

private function handleSOAP() {
$soap = new Zend_Soap_Server($this->_WSDL_URI);

public function clientAction() {
$client = new Zend_Soap_Client($this->_WSDL_URI);

$this->view->add_result = $client->math_add(11, 55);
$this->view->not_result = $client->logical_not(true);
$this->view->sort_result = $client->simple_sort(
array("d" => "lemon", "a" => "orange",
"b" => "banana", "c" => "apple"));



And the code for the Soaptest.php class:

class Soaptest {
* Add method
* @param Int $param1
* @param Int $param2
* @return Int
public function math_add($param1, $param2) {
return $param1+$param2;

* Logical not method
* @param boolean $param1
* @return boolean
public function logical_not($param1) {
return !$param1;

* Simple array sort
* @param Array $array
* @return Array
public function simple_sort($array) {
return $array;


You can also download the full project here.

As you can see you don't have to write a lot of code to back up the web service.

Let's discuss the controller first, because there's where the “magic” happens. The index action handles two types of requests: the request for the WSDL, handled by the hadleWSDL() method and the actual SOAP request, handled by the handleSOAP() method.

You can go ahead and try to see how your WSDL looks by accessing http://URL_TO_WEB_SERVICE/soap?wsdl , where URL_TO_WEB_SERVICE is the URL where you have deployed the example. Now imagine that you would have to construct and maintain this yourself, by hand, as old school bearded guys would. Well you don't, because this is handled by Zend_Soap_AutoDiscover which will create the WSDL file for you. The only thing that Zend_Soap_AutoDiscover needs to know is the class you want to use for the web service. Also, because PHP is not strongly typed, you will have to put PHPDoc blocks, because SOAP needs to know what types you are using as parameters and what types you are returning. Have a look here if PHPDoc does not ring a bell .

The SOAP server is handled by the Zend_Soap_Server class, and all it needs is the class you intend to use for the web service, and the URI to your WSDL file. Remember when you checked out how the WSDL file looks? That's exactly the URI you will have to use. In the example you will have to put that into the $_WSDL_URI variable, defined in the SoapController.

That was the SOAP server. Simple, right? Now let's have some tests on the server by implementing a simple SOAP client. The client is handled by the Zend_Soap_Client class that is constructed in the same manner as the server class, it needs just the URI to the WSDL file. After you have constructed the client, you can access the methods defined by the SOAP server in the same way you would access the methods of an object. In the example above you have a simple class, called Soaptest, that defines three very simple methods. Feel free to change the class and test your own methods. While you are playing with the server, you might notice that the WSDL file is cached, so if you change something into the Soaptest.php file, you might not get the expected result. Just delete the cached WSDL file from /tmp/wsdl-* while you do your tests.

You definitely want to have a look at the Zend Framework documentation located here.

That was it, as promissed: your SOAP web service up in minutes.