Thursday, 12 March 2015

Reduce your git repository size in minutes

It is a fact that your git repository accumulates a lot of history. Even though git was not built for binary files, people do store them in repositories and that contributes to the growth. At a certain point you might be removing binary files and looking back at the history of an image is not something that you do every day. So why not remove all the massive blobs from history? I know, it sounds like you need to rewrite the history and that is dangerous isn't it? Not quite, with a nice tool called bfg.

Right, let's start:

1. Download bfg or install it via brew, yum, etc.

2. Create a bare clone of your git repository:
git clone --mirror git://something.com/big-repo.git

3. Create a backup of the repository(just in case)
cp -r big-repo.git big-repo.git_bak

4. Run this:
bfg -b 100K big-repo.git

This will remove all files over 100K, but don't worry, HEAD is protected. There are many other options(including protecting other branches), have a look at their documentation or just run bfg with no arguments to see the options.

5.  Run git gc to actually remove the files

cd big-repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive

6. [Optional] Create a new repo where you push the changes. I like to push the changes into a new repo to be 100% sure that the repository is in a good state. Before pushing, change the url for remote "origin" inside big-repo.git/config.

7. Push the changes:
git push

8. Done. Enjoy your lean repository!

3 comments: