• person rss_feed

    Timothée Jaussoin’s feed

    Blog

    • chevron_right

      Cleaning up an old GIT Repo

      Timothée Jaussoin · Thursday, 24 March, 2016 - 12:53 edit · 2 minutes

    One of the main features of Git is that it keeps track of everything that has been committed… everything. This can be an issue if, back in time, someone had committed some huge files. Even if these files have been removed they are kept in the Git repository archives. In the end all these old files can slow down a lot of your Git manipulations.

    In this little tutorial we will see how we can find these files and delete them properly by rewriting the Git history.

    Step 0: Prepare your Git repository

    Before doing the cleanup, be sure that you have cloned the whole repository and branches. You can use this small bash script that I have found here to help you.

    #!/bin/bash
    for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master`; do
        git branch --track ${branch##*/} $branch
    done
    

    Step 1: Export and filter the big files

    From the Git repository that you want to clean-up we will first export a list of all the files ever versioned. Don't worry if this takes a while to process.

    $ git verify-pack -v objects/pack/*.idx > files.txt
    

    From this list we will then be able to filter and find what we are looking for. Here is the magic command.

    $ join -o "1.1 1.2 2.3" < (git rev-list --objects --all | sort) < ( cat files.txt | sort -k3 -g | tail -5 | sort) | sort -k3 -g
    

    From this command we are extracting the 5 biggest files from the Git repository, their paths and their sizes. This will give you something like this.

    bca72b793ab3db0e423a1865ee7cae7e273eca94 Assets/Art/Textures/Big_File.psd 258713292
    633db8bdc72d227ca2e054fd006dac4091078a2d Assets/Art/Environment/Textures/Im_So_Big.psd 260855564
    208e445678928260fbac309ee3ba522e3fd84f50 Assets/Art/Textures/What_A_Big_File.psd 290092325
    9b2bffb966216587ee14fa24e74e663fa0eff5de Assets/Art/Environment/Textures/Wow_Im_So_Huge.psd 301903493
    47895d134b5d228f97fb9b279aafe3d1346a4a20 Assets/Art/Environment/Textures/Im_Bigger_Than_You_Think.psd 353411556
    

    Step 2: Remove the files

    If you are sure that these files are no longer relevant and can be removed properly from the repository, let's nuke them one by one!

    $ git filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all
    

    Replace filename by the path of the file and wait for Git to rewrite everything properly.

    Step 3: Push and tell everyone about your changes

    It's time to push your changes !

    $ git push origin --force --all
    $ git push origin --force --tags
    

    Now you have to tell all the contributors to rebase their local copy or get a fresh version from the repository, that is the tricky part but it's mandatory if you want everyone to have the cleaned-up version ;)