Blog: PHP, Python, Linux, Web services & Continuous delivery

Vendor Branches with Git (including third party code)

In Subverison, externals and vendor branches are used to include code from different repositories in your own projects. Externals are used when we want to include code but have no need to modify it and a vendor branch would be used when we want to include code that we also need to make custom modifications to i.e. for a bug fix / security patch that we cannot wait for the maintainers of the third party code to fix.

I've outlined three methods that you can use in git to include third party code:

  1. Git submodules
  2. Git Submodules with Upstream branching
  3. Fork and submodule

There are other techniques out there such as sub tree merging but this is not a widely adopted approach and in my experience makes future maintenance more awkward than it need be.


#1 Git Submodules

Use when you want to:

  • Integrate third party code that you will never need to modify into a subdirectory of your current project
  • Track upstream changes to the third party code

This is the most straight forward and the most common way of including external code in one git repo into another. Although at a glance it may resemble an Subversion external there are some important differences.

  • A git submodule will clone the entire target git repository where as a Subversion external can be set to any part of a target SVN repository.
  • Git Submodules are not updated automatically on every pull where in Subversion your externals are kept up to date by default. With Git the submodule references a particular commit, usually the HEAD at the time when you added the submodule.
  • Git Submodules are not automatically cloned (see below) when you clone the main repository.

Although this may sound a bit more restrictive at first it's actually a good thing as it means each of your repositories are much more likely to be organized for a single purpose which will help keeps things a lot cleaner. I've seen a few too many monolithic and slow Subversion repositories that have become a dumping ground for multiple projects.

Adding a git sub module

Submodules are added with just a few commands:

# Clone an existing repo (or create a new one)
git clone git@github.com:pipe-devnull/repoA.git

# Add the submodule. The first part is the address of the repo you want to include and the second part is the path within your repo that the included repo (repoB) will be referenced
git submodule add git@github.com:pipe-devnull/repoB.git myLib/3rdPartySubmoduleExample

# Do the usual add, commit and push to master
git add *
git commit -a -m "added new submodule example"
git push origin master

You should notice that a new file called .gitsubmodules has been created in the root of the repository. This file contains the definitions of all submodules that belong to this repository and if you open it you will see the mapping between the local paths and 3rd party include.

[submodule "myLib/3rdPartySubmoduleExample"]
    path = myLib/3rdPartySubmoduleExample
    url = git@github.com:pipe-devnull/repoB.git

Git Submodules are not automatically cloned!

The annoying thing about submodules is that when you clone a repo that already contains submodules, the submodules are not automatically pulled down to your local copy. After the initial clone has finished you also need to run a git submodule init followed by a git submodule update.

# Make a fresh clone of a git repository
git clone git@github.com:pipe-devnull/repoA.git

# Intialize submodules
git submodule init

# Pull down all submodule content locally
git submodule update

Likewise when you pull changes into your main git repository by running git pull the submodules are not updated automatically. You will have to explicitly run git pull within the submodules in order to keep them up to date. If you have multiple submodules the following may command may be useful

git submodule foreach git pull origin master

EDIT: Its far easier to add the --recursive flag to your original clone command which will then also initialize and update any submodules listed immediately after the initial clone has completed.


#2 Upstream branching and git submodules

Use when you want to:

  • Integrate third party code in a subdirectory of your current project in and amongst your own files
  • Make custom modifications to the third party library
  • Track upstream changes to the third party code

As mentioned earlier submodules provide the ability for you to include external code but do not permit you to modify that code . Its often the case that you start using a library supplied by a third party and soon find yourself needing to make some changes to it in order to support your own application. Even if you try and contribute your changes upstream the contribution process could take months depending on the project.

Given this situation you now need to include the library, track any upstream changes and then maintain the changes you have made. In Subversion we would use a vendor branch to handle this situation however in git we have to do things slightly differently.

  1. Create a new repository containing only a read me file with two branches: master and upstream.
  2. Get a copy of the 3rd party library code, unzip / untar it into the upstream branch, add the files and commit them. Make sure the copy does not contain a .git directory.
  3. Create a tag of the upstream branch that corresponds to the version of the library (i.e. 1.0.1.1)
  4. Switch back to the master branch and merge all contents from the upstream branch
  5. Make your modifications to the master branch
  6. Add this repository as a submodule of your project.

In time, when a change is made to the library we can update the upstream branch with this new version and then merge those changes back into the master branch which also contains your own modifications. Any conflicts can be resolved in this repo, away from your main project which is kept nice and clean. If a future version of the library is released that removes the need for you to maintain your customizations you can just replace the submodule definitions in your main project.

Doing it command by command

# Clone new clean repo (only contains readme)
git clone git@github.com:pipe-devnull/upstream-branching.git
# Create upstream 
git checkout -b upstream
# Push the new branch back to origin
git push origin upstream
# Unpack the third party lib into the upstream branch
# tar -xzf thirdPartyLib.tar.gz .
# add, commit, tag and push back up to remote
git add * 
git commit -a -m "V1 lib"
git tag "v1.0.0"
git push origin upstream

# Switch back to master, merge from upstream  and push to master
git checkout master 
git merge upstream
git push

With the above done you can make your customizations to the library on the master branch and add the submodule into your own project's repository as described in #1. Fast forward a few day, weeks or months when a new release of the third party library is available and you can upgrade your copy as follows.

# Branch master, this will be our upgrade branch
git branch upgrade
# Switch to the original upstream branch
git checkout upstream
rm -rf ThirdPartyLib
# Unpack the new copy of the library
# tar -xzf thirdPartyLib.tar.gz .
git add *
git commit -a -m "V 2.0.0"
git tag "V2.0.0"
# switch back to upgrade branch, merge from upstream
git checkout upgrade
git merge upstream
# Resolve any conflicts and then merge into master
git checkout master
git merge upgrade

#3 Fork and submodule

Use when you want to:

  • Integrate third party code in a subdirectory of your current project away from your own files
  • Make custom modifications to the third party library
  • Track upstream changes to the third party code

This is a good and easy option when you want to make some changes to a third party submodule that is not mixed in amongst your own files. Fork the target repo and the include that fork as the submodule rather than the original version.

You can update the fork as often as you like and commit your own changes to that fork. The repo that contains the submodule can remain isolated from the thrid party code but you still get to safely make the changes tho the third party code that you needed to.

comments powered by Disqus