Skip to main content

Merging Git Repositories

No project of significant size that I've ever seen has retained its initial structure. Restructuring projects is a fact of life, but unfortunately Git doesn't make it easy.

Fundamentally this stems from the way Git works, treating changes as a succession of snapshots and not storing any other metadata. Of course this is part of what makes Git fast and efficient, but at the expense of making some common operations more difficult for users. Git really is a perfect 21st century illustration of the classic "Worse Is Better" paradigm of successful software 😀

Now I'm going to discuss how to do the opposite and merge separate repositories into one. On the face of it, this would seem a simpler task as Git has powerful support for merging...

Let's take the opposite example to my splitting apart article - say you have a main Git repo (ProjA) and a second repo (ProjB) in a subdirectory, ie ProjA/ProjB. You want to merge ProjB into ProjA and have a single master repo ProjA which will retain all the history of both projects. ProjB will remain in ProjA/ProjB. 

Step 1: Temporarily Move ProjB

First of all we need to move the ProjB repo out of the ProjA tree, so that Git will be able to overwrite ProjB when we merge the repos:

$ cd ProjA
$ mv ProjB <new location>

Step 2: Remove ProjB from .gitignore

You probably have ProjB in the .gitignore file for ProjA. That needs to be removed so you can work on ProjB after the merge.

Step 3: Move ProjB files to ProjB/ProjB

If we just merge ProjB into ProjA, as in Step 4 below, all the ProjB files will end up in the root of ProjA. That's not what we want - we want them to go into the ProjB subdirectory after the merge. You would also likely have merge conflicts with common files like .gitignore.

Unfortunately this is the step where we see all the unpleasantness of Git 💩 - if we just make a ProjB subdirectory and git mv all the files to it (as described in my earlier post on Git renaming), history is only partially retained. git log --follow allows you to see the history of the moved files, but git diff, bisect etc can't find the revisions. You can still diff ProjB commits from the ProjB log, just not for an individual ProjB file. Future Git versions may fix this. If you are not bothered by these issues proceed to step 4.

However to fix it properly, we need to edit the commit history of ProjB to make it appear that the files have always been in the ProjB subdirectory. Caution that this is a destructive operation, so make sure you have a backup! There are also many ways to do this in Git and I recommend avoiding methods that involve using sed on the names of files - it's really, really easy to get that wrong. I prefer a more obviously correct method like this:

$ cd ProjB
$ git filter-branch --prune-empty --tree-filter '
if [ ! -e ProjB ]; then
    mkdir -p ProjB
    git ls-tree --name-only $GIT_COMMIT | xargs -I files mv files ProjB
fi'

Step 4: Merge ProjB into ProjA

From here it's pretty straightforward, we just merge ProjB into ProjA. 

Note that --allow-unrelated-histories is required so that Git will merge commits that don't have a common root.

$ cd ProjA
$ git remote add ProjB <ProjB location>
$ git fetch ProjB
$ git merge --allow-unrelated-histories ProjB/master
$ git remote remove ProjB

or just

$ git pull --allow-unrelated-histories <ProjB location> master

Note that I'm only illustrating merging the master branch - if you have other branches these will have to be merged separately.

Step 5: (Optional) Remove ProjB Repo

After checking the ProjA file structure and history is all good, you can remove the old ProjB repo.

If things are not good, which I must admit they were not the first time I did this, you can reset to just before the merge and try again.

One of the things I do like about Git 😀 is the ease of undoing (and redoing) changes. To undo, find the commit hash just before the merge with:

$ git reflog | head

Then rewind to the good point:

$ git reset --hard <commit hash>

Comments

Popular posts from this blog

East Devon Continued

Some iPhone pictures: Seaton Bay from Beer Hill at Sunset Gulls on Beer Beach We also had a pair of Pheasants in the garden, which was a bit of a surprise. There are always plenty of rabbits and wild birds, but this is the first time I've seen game birds. Here is the male, sitting on the garden wall, wondering what I'm up to: Male Pheasant I also spotted these attractive white Cyclamen in the garden: Cyclamen

Setting Environment Variables and the PATH on MacOS

Time Was setting environment variables and the PATH on a Mac running OSX was just like any other *IX.  However with successive OS releases Apple have changed how this works (more than once) and generally made it more difficult 😢  This article discusses how I go about setting environment variables on Mojave and Catalina. Why does this matter? MacOS doesn't add  /usr/local/bin to the PATH by default, which is unfortunate as most *IX-style programs you build yourself will be installed in there. If you only ever launch stuff from Terminal, all you have to do is set environment variables and the path from Shell startup files in the time immemorial fashion. However, this doesn't help with native Mac Apps like Emacs, which aren't launched from a shell and where you may still want to access custom environment variables and programs in /usr/local/bin . Setting the PATH In the past you could add to the path via /etc/paths (or paths.d ), however this no longer works...