Git Internals
This post is the first part of a series on the internal working of Git.
The Git Internals series
- The
.git
Directory (This post) - Git Objects
The entire Git Internals series is available as a talk as well. Feel free to watch the talk instead. π
The .git
Directory
On executing the git init
command in a directory, Git creates a hidden .git
directory in that directory. The .git
directory contains all the project history data on which Git can perform its version control functions. It also contains files to configure the way Git handles things for that particular repository.
The .git
Directory Contents
.git
ββββaddp-hunk-edit.diff
ββββCOMMIT_EDITMSG
ββββconfig
ββββdescription
ββββFETCH_HEAD
ββββHEAD
ββββhooks
β ββββ<*.sample>
ββββindex
ββββinfo
β ββββexclude
ββββlfs
β ββββcache
β β ββββlocks
β β ββββrefs
β β ββββheads
β β ββββ<branch_names>
β β ββββverifiable
β ββββobjects
β β ββββ<first_2_SHA-256_characters>
β β ββββ<next_2_SHA-256_characters>
β β ββββ<entire_64_character_SHA-256_hash>
β ββββtmp
ββββlogs
β ββββHEAD
β ββββrefs
β ββββheads
β β ββββ<branch_names>
β ββββremotes
β β ββββ<remote_aliases>
β β ββββ<branch_names>
β ββββstash
ββββMERGE_HEAD
ββββMERGE_MODE
ββββMERGE_MSG
ββββobjects
β ββββ<first_2_SHA-1_characters>
β β ββββ<remaining_38_SHA-1_characters>
β ββββinfo
β ββββpack
β ββββ<*.idx>
β ββββ<*.pack>
ββββORIG_HEAD
ββββpacked-refs
ββββrebase-merge
β ββββgit-rebase-todo
β ββββgit-rebase-todo.backup
β ββββhead-name
β ββββinteractive
β ββββno-reschedule-failed-exec
β ββββonto
β ββββorig-head
ββββrefs
ββββheads
β ββββ<branch_names>
ββββremotes
β ββββ<remote_aliases>
β ββββ<branch_names>
ββββstash
ββββtags
ββββ<tag_names>
The index
File
- This file contains the details of staged (added) files and is the staging area of the repository.
NOTE: The words ‘index’, ‘stage’ and ‘cache’ are the same in Git and are used interchangeably.
- It is created when files are added for the first time and is updated every time the
git add
command is executed.
- It is a binary file and just printing contents using
cat .git/index
will result in gibberish. Its contents can be accessed using thegit ls-files --stage
[plumbing command].
From the image above
100644
is the mode of the file. It is an octal number.Octal: 10 0 644 Binary: 001000 000 110100100
- The first six binary bits indicate the object type.
001000
indicates a regular file. (As seen in this case.)001010
indicates a symlink (symbolic link).001110
indicates a gitlink.
- The next three binary bits (
000
) are unused. - The last nine binary bits (
110100100
) indicate Unix file permissions.644
and755
are valid for regular files.- Symlinks and gitlinks have the value
0
in this field.
- The first six binary bits indicate the object type.
The next 40 character hexadecimal string is the SHA-1 hash of the file.
The next number is a stage number/slot, which is useful during merge conflict handling.
0
indicates a normal un-conflicted file.1
indicates the base, i.e., the original version of the file.2
indicates the ‘ours’ version, i.e., the HEAD version with both changes.3
indicates the ’theirs’ version, i.e., the file with the incoming changes.
The last string is the name of the file being referred to.
The HEAD
File
It is used to refer to the latest commit in the current branch.
Usually it does not contain a commit SHA-1, but contains the path to a file (of the name of the current branch) in the
refs
directory which stores the last commit’s SHA-1 hash in that branch.It contains a commit’s SHA-1 hash when a specific commit or tag is checked out. (Detached
HEAD
state.)Eg:
# in the 'main' branch $ cat .git/HEAD ref: refs/heads/main $ git switch test_branch Switched to branch 'test_branch' $ cat .git/HEAD ref: refs/heads/test_branch
The refs
Directory
.git
ββββ...
ββββrefs
ββββheads
β ββββ<branch_name(s)>
ββββremotes
β ββββ<remote_alias(es)>
β ββββ<branch_name(s)>
ββββstash
ββββtags
ββββ<tag_name(s)>
- This directory holds the reference to the latest commit in every local branch and fetched remote branch in the form of the SHA-1 hash of the commit.
- It also stores the SHA-1 hash of the commit which has been [tagged].
- The
HEAD
file references a file (of the name of the branch that is currently checked out) from theheads
directory in this (refs
) directory.
The packed-refs
File
- One file is created per branch and tag in the
refs
directory. - In a repository with a lot of branches and tags, there is a huge number of refs and a lot of the refs and tags are not actively used/changed.
- These refs occupy a lot of storage space and cause performance issues.
- The
git pack-refs
command is used to solve this problem. It stores all the refs in a single file calledpacked-refs
.
- If a ref is missing from the usual
refs
directory after packing, it is looked up in this file and used if found. - Subsequent updates to a packed branch ref creates a new file in the
refs
directory as usual.
The logs
Directory
.git
ββββ...
ββββlogs
ββββHEAD
ββββrefs
ββββheads
β ββββ<branch_name(s)>
ββββremotes
β ββββ<remote_alias(es)>
β ββββ<branch_name(s)>
ββββstash
- Contains the history of all commits in order.
- Every row consists of the parent commit’s SHA-1 hash, the current commit’s SHA-1 hash, the committer’s name and e-mail, the Unix Epoch Time of the commit, the time zone, the type of action and message in order.
- There are logs for every branch in the local Git repository and for the fetched branches from the remote Git repository/repositories (if any).
- Inside the
logs
directory- The
HEAD
file stores information about all the commands executed by the user, such as branch switches, commits, rebases, etc. - The files in the refs directory only include branch specific operations and history, such as commits, pulls, resets, rebases, etc.
- The
The FETCH_HEAD
file
It contains the latest commits of the fetched remote branch(es).
It corresponds to the branch which was
Checked out when last fetched.
- From the image above, only one branch is displayed without the
not-for-merge
text. The odd one out (the ‘main’ branch in this case) is the branch which was checked out while fetching.
- From the image above, only one branch is displayed without the
Explicitly mentioned using the
git fetch <remote_repo_alias> <branch_name>
command.
The COMMIT_EDITMSG
File
- The commit message is written in this file.
- This file is opened in an editor on executing the
git commit
command. - It contains the output of the
git status
command commented out using the#
character. - If there has been a commit before, then this file will show the last commit message along with the
git status
output just before that commit.
The objects
Directory
.git
ββββ...
ββββobjects
ββββ<first_2_SHA-1_characters>
β ββββ<remaining_38_SHA-1_characters>
ββββinfo
ββββpack
ββββ<*.idx>
ββββ<*.pack>
- The most important directory in the
.git
directory. - It houses the data (SHA-1 hashes) of all the Blob, Commit and Tree Objects in the repository.
- To decrease access time, objects are placed in buckets (directories), with the first two characters of their SHA-1 hash as the name of the bucket. The remaining 38 characters are used to name the object’s file.
- More on the
pack
directory.
The info
Directory
.git
ββββ...
ββββinfo
ββββexclude
- It contains the
exclude
file which behaves like the.gitignore
file, but is used to ignore files locally without modifying.gitignore
. - More on the
exclude
file.
The config
File
- This file contains the local Git repository configuration.
- It can be modified using the
git config --local
command.
The addp-hunk-edit.diff
File
- Created when the
e
(edit) option is chosen in thegit add --patch
command. - Enables the manual edit of a hunk of a file to be staged.
The ORIG_HEAD
File
- It contains the SHA-1 hash of a commit.
- It is the previous state of the HEAD, but not necessarily the immediate previous state.
- It is set by certain commands which have destructive/dangerous behaviour, so it usually points to the latest commit with a destructive change.
- It is less useful now because of the [
git reflog
command] which makes reverting/resetting to a particular commit easier.
The description
File
- This is the description of the repository.
- This file is used by GitWeb, which hardly anyone uses today, so can be left alone.