Few days ago I read the response of Linus on SHA-1 collision announced by Google. I knew that Git uses SHA-1 hash to store the identity of the object, but I was not sure how it was formed. So, I spent few hours digging into this and below are the details for those interested.

Let’s start with an empty git repo

    mkdir /tmp/gittest
    cd /tmp/gittest
    git init
    Initialized empty Git repository in /private/tmp/gittest/.git/
    tree -a
        └── .git
            ├── HEAD
            ├── config
            ├── description
            ├── hooks
            │   ├── applypatch-msg.sample
            │   ├── commit-msg.sample
            │   ├── post-update.sample
            │   ├── pre-applypatch.sample
            │   ├── pre-commit.sample
            │   ├── pre-push.sample
            │   ├── pre-rebase.sample
            │   ├── prepare-commit-msg.sample
            │   └── update.sample
            ├── info
            │   └── exclude
            ├── objects
            │   ├── info
            │   └── pack
            └── refs
                ├── heads
                └── tags

As you can see there are no objects in the repo. Let’s add some content and look at the objects

    tree -a ./.git/objects
    ├── 9d
    │   └── aeafb9864cf43055ae93beb0afd6c7d144bfa4
    ├── info
    └── pack

The folder name 9d comes from the first two digits of the SHA-1 hash, the rest 38 is used in filename of the object. To get back the content of the file we added, run the below command

    # git cat-file is to cat the file the git way and
    # -p option is to pretty-print the file	

    git cat-file -p 9daeafb9864cf43055ae93beb0afd6c7d144bfa4

Now, how do we create this hash? From the “Object storage format” documentation of Git, the hash is formed from “<ascii type without space> + <space> + <ascii decimal size> + <byte\0> + <binary object data>”

    # get the object size using -s option
    SIZE=`git cat-file -s 9daeafb9864cf43055ae93beb0afd6c7d144bfa4`

    # get the object content using -o option
    CONTENT=`git cat-file -p 9daeafb9864cf43055ae93beb0afd6c7d144bfa4`

    echo "${TYPE} ${SIZE}\0{CONTENT}" | sha1sum

You can also know the hash using “git hash-object” command as show below

    echo "test" | git hash-object --stdin