You are here

Extracting Multiple Subtrees from a Git Repository

Submitted by h2b on 17. November 2015 - 19:54



Multiple subtrees of a Git repository shall be extracted and transferred to a new archive while keeping the version history.

Git provides different approaches to solve this task. Here, we use the git-subtree command to create branches from the subtrees to be detached. Then, these branches will be imported from another repository. Sources were the manpage (man git-subtree) and this answer on stackoverflow along with some comments listed there.

Let's start with a Git repository bigrepo.git containing the following directory structure:



Maybe, it turned out that the files below lib comprise a library that is useful not only for this project but could serve general purposes. Thus, this should become a project of its own and the lib subtrees below main and test shall be moved to a separate repository.

For the following steps, first we need a working copy of the repository. So, if bigrepo.git is a bare repository, we create one in the current directory by

    git clone /srv/git/bigrepo.git

(provided that bigrepro.git resides below /srv/git). If we have no bare repository, it is recommended to make a copy nonetheless to simplify the clean-up in the end.

Now we go into the the new archive that we just cloned and create two new branches—one for each subtree to extract, say split-lib-main and split-lib-test—using the subcommand split of git-subtree:

    cd bigrepo

    git subtree split -P src/main/lib -b split-lib-main

    git subtree split -P src/test/lib -b split-lib-test

These new branches contain all files and directories below the path specified by the -P option including the whole version history, respectively; the path specified by -P must not contain any leading or trailing slashes. The option -b specifies the name of the branch which you can choose arbitrarily.

Though, following this procedure, the path prefixes of the files in the new branches are lost. This means, for instance, the files and directories in split-lib-main do not preserve any information about their original parent src/main/lib. The directory structure below, however, remains intact.

Keeping this in mind, we continue by creating the new repository—let's call it librepothat will contain the extracted subtrees only. For simplicity, the new repository will be created at the same directory level where bigrepo resides. Since we are still in  bigrepo, we do the following::

    cd ..

    mkdir librepo

    cd librepo

    git init

From this we have a new repository that is able to incorporate the files extracted above.Unfortunatly, the command we will use later doesn't work on an empty repository. Therefore, we just commit some arbitrary file, say some that we will need in the future anyway:

    touch .gitignore

    git add .gitignore

    git commit -m "Ignore file added." .gitignore

Now, we can complete the extraction:

    git subtree add -P src/main/lib ../bigrepo split-lib-main

    git subtree add -P src/test/lib ../bigrepo split-lib-test

By -P we re-add the path root that had been lost; if we had the need, we could specify any other root instead.

If there are additional files that belong to librepo that cannot be extracted by this way, e.g., configuration files for Maven, license or README files from the project's root directory, add them manually. The version history will be lost for those files, but this should not be too problematic in this case.

Our new librepo now contains all files and directories that are stored in bigrepo below scr/main/lib and src/test/lib including all version information. We can create a bare repository from this by the usual Git commands and put it at any location.

At this point, git-subtree even offers the opportunity to keep the component parts in both repositories (bigrepo and librepo) and work on them concurrently. However, this is beyond the problem definition elaborated here and might cause consistency problems anyhow. Thus, this scenario is not considered here further.

What remains is the clean-up: Since we made a copy of the original repository in the beginning, we just can throw away that copy. Alternatively, we can push the splitted branches to the original repository for reference. In any case, we might want to delete the extracted components from the original archive (of course, after we conviced ourself that everything is in librepo), to avoid getting drifting-apart concurrent developments.