Git Internals

This post is the first part of a series on the internal working of Git.

The Git Internals series

The entire Git Internals series is available as a talk as well. Feel free to watch the talk instead. πŸ‘‡

The .git Directory

On executing the git init command in a directory, Git creates a hidden .git directory in that directory. The .git directory contains all the project history data on which Git can perform its version control functions. It also contains files to configure the way Git handles things for that particular repository.

The .git Directory Contents

.git
β”œβ”€β”€β”€addp-hunk-edit.diff
β”œβ”€β”€β”€COMMIT_EDITMSG
β”œβ”€β”€β”€config
β”œβ”€β”€β”€description
β”œβ”€β”€β”€FETCH_HEAD
β”œβ”€β”€β”€HEAD
β”œβ”€β”€β”€hooks
β”‚   └───<*.sample>
β”œβ”€β”€β”€index
β”œβ”€β”€β”€info
β”‚   └───exclude
β”œβ”€β”€β”€lfs
β”‚   β”œβ”€β”€β”€cache
β”‚   β”‚   └───locks
β”‚   β”‚       └───refs
β”‚   β”‚           └───heads
β”‚   β”‚               └───<branch_names>
β”‚   β”‚                   └───verifiable
β”‚   β”œβ”€β”€β”€objects
β”‚   β”‚   └───<first_2_SHA-256_characters>
β”‚   β”‚       └───<next_2_SHA-256_characters>
β”‚   β”‚           └───<entire_64_character_SHA-256_hash>
β”‚   └───tmp
β”œβ”€β”€β”€logs
β”‚   β”œβ”€β”€β”€HEAD
β”‚   └───refs
β”‚       β”œβ”€β”€β”€heads
β”‚       β”‚   └───<branch_names>
β”‚       β”œβ”€β”€β”€remotes
β”‚       β”‚   └───<remote_aliases>
β”‚       β”‚       └───<branch_names>
β”‚       └───stash
β”œβ”€β”€β”€MERGE_HEAD
β”œβ”€β”€β”€MERGE_MODE
β”œβ”€β”€β”€MERGE_MSG
β”œβ”€β”€β”€objects
β”‚   β”œβ”€β”€β”€<first_2_SHA-1_characters>
β”‚   β”‚   └───<remaining_38_SHA-1_characters>
β”‚   β”œβ”€β”€β”€info
β”‚   └───pack
β”‚       β”œβ”€β”€β”€<*.idx>
β”‚       └───<*.pack>
β”œβ”€β”€β”€ORIG_HEAD
β”œβ”€β”€β”€packed-refs
β”œβ”€β”€β”€rebase-merge
β”‚   β”œβ”€β”€β”€git-rebase-todo
β”‚   β”œβ”€β”€β”€git-rebase-todo.backup
β”‚   β”œβ”€β”€β”€head-name
β”‚   β”œβ”€β”€β”€interactive
β”‚   β”œβ”€β”€β”€no-reschedule-failed-exec
β”‚   β”œβ”€β”€β”€onto
β”‚   └───orig-head
└───refs
    β”œβ”€β”€β”€heads
    β”‚   └───<branch_names>
    β”œβ”€β”€β”€remotes
    β”‚   └───<remote_aliases>
    β”‚       └───<branch_names>
    β”œβ”€β”€β”€stash
    └───tags
        └───<tag_names>

The index File

NOTE: The words ‘index’, ‘stage’ and ‘cache’ are the same in Git and are used interchangeably.

  • It is created when files are added for the first time and is updated every time the git add command is executed.

Index file explained

  • It is a binary file and just printing contents using cat .git/index will result in gibberish. Its contents can be accessed using the git ls-files --stage [plumbing command].

&lsquo;git ls-files &ndash;stage&rsquo; command

  • From the image above

    • 100644 is the mode of the file. It is an octal number.

      Octal: 10 0 644
      Binary: 001000 000 110100100
      
      • The first six binary bits indicate the object type.
      • The next three binary bits (000) are unused.
      • The last nine binary bits (110100100) indicate Unix file permissions.
        • 644 and 755 are valid for regular files.
        • Symlinks and gitlinks have the value 0 in this field.
    • The next 40 character hexadecimal string is the SHA-1 hash of the file.

    • The next number is a stage number/slot, which is useful during merge conflict handling.

      • 0 indicates a normal un-conflicted file.
      • 1 indicates the base, i.e., the original version of the file.
      • 2 indicates the ‘ours’ version, i.e., the HEAD version with both changes.
      • 3 indicates the ‘theirs’ version, i.e., the file with the incoming changes.
    • The last string is the name of the file being referred to.

  • More on the index file contents.

  • More on the index file usage.

The HEAD File

  • It is used to refer to the latest commit in the current branch.

  • Usually it does not contain a commit SHA-1, but contains the path to a file (of the name of the current branch) in the refs directory which stores the last commit’s SHA-1 hash in that branch.

  • It contains a commit’s SHA-1 hash when a specific commit or tag is checked out. (Detached HEAD state.)

  • More on the HEAD file.

  • Eg:

    # in the 'main' branch
    $ cat .git/HEAD
    ref: refs/heads/main
    $ git switch test_branch
    Switched to branch 'test_branch'
    $ cat .git/HEAD
    ref: refs/heads/test_branch
    

The refs Directory

.git
β”œβ”€β”€β”€...
└───refs
    β”œβ”€β”€β”€heads
    β”‚   └───<branch_name(s)>
    β”œβ”€β”€β”€remotes
    β”‚   └───<remote_alias(es)>
    β”‚       └───<branch_name(s)>
    β”œβ”€β”€β”€stash
    └───tags
        └───<tag_name(s)>
  • This directory holds the reference to the latest commit in every local branch and fetched remote branch in the form of the SHA-1 hash of the commit.
  • It also stores the SHA-1 hash of the commit which has been [tagged].
  • The HEAD file references a file (of the name of the branch that is currently checked out) from the heads directory in this (refs) directory.

The packed-refs File

  • One file is created per branch and tag in the refs directory.
  • In a repository with a lot of branches and tags, there is a huge number of refs and a lot of the refs and tags are not actively used/changed.
  • These refs occupy a lot of storage space and cause performance issues.
  • The git pack-refs command is used to solve this problem. It stores all the refs in a single file called packed-refs.

Print the packed-ref file

  • If a ref is missing from the usual refs directory after packing, it is looked up in this file and used if found.
  • Subsequent updates to a packed branch ref creates a new file in the refs directory as usual.

The logs Directory

.git
β”œβ”€β”€β”€...
└───logs
    β”œβ”€β”€β”€HEAD
    └───refs
        β”œβ”€β”€β”€heads
        β”‚   └───<branch_name(s)>
        β”œβ”€β”€β”€remotes
        β”‚   └───<remote_alias(es)>
        β”‚       └───<branch_name(s)>
        └───stash
  • Contains the history of all commits in order.

Print a branch&rsquo;s log file

  • Every row consists of the parent commit’s SHA-1 hash, the current commit’s SHA-1 hash, the committer’s name and e-mail, the Unix Epoch Time of the commit, the time zone, the type of action and message in order.
  • There are logs for every branch in the local Git repository and for the fetched branches from the remote Git repository/repositories (if any).
  • Inside the logs directory
    • The HEAD file stores information about all the commands executed by the user, such as branch switches, commits, rebases, etc.
    • The files in the refs directory only include branch specific operations and history, such as commits, pulls, resets, rebases, etc.

The FETCH_HEAD file

  • It contains the latest commits of the fetched remote branch(es).

  • It corresponds to the branch which was

The COMMIT_EDITMSG File

  • The commit message is written in this file.
  • This file is opened in an editor on executing the git commit command.
  • It contains the output of the git status command commented out using the # character.
  • If there has been a commit before, then this file will show the last commit message along with the git status output just before that commit.

The objects Directory

.git
β”œβ”€β”€β”€...
└───objects
    β”œβ”€β”€β”€<first_2_SHA-1_characters>
    β”‚   └───<remaining_38_SHA-1_characters>
    β”œβ”€β”€β”€info
    └───pack
        β”œβ”€β”€β”€<*.idx>
        └───<*.pack>
  • The most important directory in the .git directory.
  • It houses the data (SHA-1 hashes) of all the Blob, Commit and Tree Objects in the repository.
  • To decrease access time, objects are placed in buckets (directories), with the first two characters of their SHA-1 hash as the name of the bucket. The remaining 38 characters are used to name the object’s file.
  • More on the pack directory.

The info Directory

.git
β”œβ”€β”€β”€...
└───info
    └───exclude

The config File

The addp-hunk-edit.diff File

  • Created when the e (edit) option is chosen in the git add --patch command.
  • Enables the manual edit of a hunk of a file to be staged.

The ORIG_HEAD File

  • It contains the SHA-1 hash of a commit.
  • It is the previous state of the HEAD, but not necessarily the immediate previous state.
  • It is set by certain commands which have destructive/dangerous behaviour, so it usually points to the latest commit with a destructive change.
  • It is less useful now because of the [git reflog command] which makes reverting/resetting to a particular commit easier.

The description File

  • This is the description of the repository.
  • This file is used by GitWeb, which hardly anyone uses today, so can be left alone.