Many of us create a few Git commits daily, either through GUI tools or the command line. It can be as simple as following these steps:

shell
# 1. Modify or create a file in your working directory.
echo '# my change' > 'test.sh'

#2. Add the modification to the staging area of git.
git add test.sh

# 3. Commit the staged changes.
git commit -m "initial commit"

Here, we've used Git high-level commands (also known as Porcelain commands) like git add, and git commit. However, there is another group of Git commands, known as Plumbing commands, that handle the low-level operations.

In this blog post, we want to create a Git commit using these low-level operations, and not the git commit command.

Before diving into low-level commands and crafting a Git commit, we need to understand a few Git basics. Let's start with the states of a file in Git.

The Basics

Files in Git can be in one of these three states:

  1. Modified: The file has changed but has not been committed to the Git database.
  2. Staged: The current version of the modified file is staged to be included in the next commit.
  3. Committed: Data is safely stored in the Git database.

Similarly, A Git project has three sections:

  1. Working Directory: These are the files that are pulled out of the Git database so you can easily modify them. Modified files reside here.
  2. Staging Area (Index): A file inside the .git directory that holds the information about what will go into your next commit. Staged files reside here.
  3. Git directory: It's where Git stores all the objects and metadata of your repository. This directory is essentially what you copy when you clone a git project. Committed files reside here.
Git Sections

Now that we understand the different sections of a Git project, we need to know what exactly a Git commit is.

Git Objects: Commits, Trees, and Blobs

A git commit is a git object. There are several types of objects in git, including blob, tree, and commit objects. These objects can be found inside the .git/objects folder.

If you look inside that folder, you'll see that everything is stored using a SHA-1 hash of the object's content rather than file names. This approach helps Git track all changes to the content of a file or directory and makes it impossible to alter the content without Git detecting it.

Blob Objects

We can store any blob (binary file) inside Git's database, making it a powerful content-addressable file system with a version-control system built on top of it. This can easily be done using one of Git's plumbing commands called git hash-object:

shell
echo 'hello world' | git hash-object -w --stdin 

The -w flag tells Git not only to return the hash of the content passed to it via standard input (--stdin) but also to store that content inside the .git/objects folder as a blob. Essentially, Git writes a binary file with this content:

JavaScript's template literal used for clarity:
const blobFileContent = `blob ${content.bytesize}\0${content}`
const blobFileName    = sha1hash(blobFileContent)

In the "hello world" case, the content of the blob file becomes: blob 11\0hello world. Git then calculates the SHA-1 hash of this content and stores the file using the hash as the filename.

Tree Objects

Tree objects allow us to store file names for one or more files. You can think of tree objects as representing directories, while blob objects represent file contents. Essentially, a tree is a collection of references to blobs along with their file names, or other trees.

Git Tree

This is the content of the tree object shown in the image above:

100644 blob 8b137891791fe96927ad78e64b0aad7bded08bdc    README
100644 blob 8b137891791fe96927ad78e64b0aad7bded08bdc    package.json
040000 tree 9c422c2393ba5463772797e780e1d4c00400374c    src

Commit Objects

A Git commit is essentially an object that contains a reference to a Git tree, along with information such as who made the changes (author), when they were made, and why they were made (commit message). A commit can also have zero parents (initial commit), one parent (normal commits), or multiple parents (merge commits).

This is the content of an example commit object:

Note: Commit message is separated from metadata via an empty line
tree 5fb4d17478fc270ea606c010293c97bb76dec583
author Avestura <me@avestura.dev> 1725466118 +0330
committer Avestura <me@avestura.dev> 1725466118 +0330

initial commit

Now that we understand blob, tree, and commit objects, we can visualize their relationships. Consider a simple scenario like this:

shell
git init # initialize the .git repository
echo 'Readme' > README
echo 'License' > LICENSE
git add README LICENSE
git commit -m 'initial commit'

In this case, a total of four objects are created in Git:

  • 1 README blob object
  • 1 LICENSE blob object
  • 1 tree object that contains references to the previous blobs and their names
  • 1 commit object that references the tree and includes the author information

If we add another commit, the new commit will have a parent metadata, pointing to the inital commit:

Git Commit

Craft a Commit, the hard way

Now that we understand the Git objects related to a commit and their relationships, we can easily create a commit using low-level Git plumbing commands.

First of all, we need to initalize a new repository:

shell
$ git init
Initialized empty Git repository in E:/Projects/git/git-playground/.git/

Now we have to create a blob object. As we already know, we can do this using the hash-object command:

shell
$ echo 'This is the content of my file' | git hash-object -w --stdin
6b59acb69a04903bfa9189e3c482fb57f77393f9

We have stored our blob object and know its hash. Now we need to create a tree object. Git normally uses the staging area (index) to create tree objects. We can create an index with a single entry (our previously created blob) using the git update-index command:

shell
git update-index --add --cacheinfo 100644 6b59acb69a04903bfa9189e3c482fb57f77393f9 myfile.txt

Explanation of the above command:

  • --add adds the file to the index, as it isn’t already there.
  • --cacheinfo <mode> <object> <path> is used because the file is not in our directory, but inside the git's database
    • The number represents the file mode. 100644 means it's a normal file. Other modes include executable files and symbolic links.
    • 6b59acb69a04903bfa9189e3c482fb57f77393f9 is the hash of the blob
    • myfile.txt is the name of the file

Now that we have the index file ready, we can create a tree object from it using write-tree:

shell
$ git write-tree
de53417c67393f9ef09239709759ecbbd5ebfb97

Git now outputs the hash of the tree object. You can check its content using the cat-file command:

shell
$ git cat-file -p de53417c67393f9ef09239709759ecbbd5ebfb97
100644 blob 6b59acb69a04903bfa9189e3c482fb57f77393f9    myfile.txt

Now that our tree object is ready and connected to the underlying blob, we can simply create the commit object using the git commit-tree command:

shell
$ echo 'My commit message' | git commit-tree de53417c67393f9ef09239709759ecbbd5ebfb97
409399744678c13717b30c103feef9451c9103bf

Finally, we have created a commit without using any of the high-level git commands (e.g. git commit). You can view the content of the newly created commit:

shell
$ git cat-file -p 409399744678c13717b30c103feef9451c9103bf
tree de53417c67393f9ef09239709759ecbbd5ebfb97
author Avestura <me@avestura.dev> 1725470340 +0330
committer Avestura <me@avestura.dev> 1725470340 +0330

My commit message

You can also view the log of the commit using git log:

shell
$ git log --stat 409399744678c13717b30c103feef9451c9103bf
commit 409399744678c13717b30c103feef9451c9103bf
Author: Avestura <me@avestura.dev>
Date:   Wed Sep 4 20:49:00 2024 +0330

    My commit message

 myfile.txt | 1 +
 1 file changed, 1 insertion(+)

If you want to see the files in your working directory, you can reset your current branch to point to the newly created commit using git reset:

shell
$ git reset --hard 409399744678c13717b30c103feef9451c9103bf
HEAD is now at 4093997 My commit message

$ ls
myfile.txt

$ cat myfile.txt
This is the content of my file

🥳 Hooray! We have crafted our commit and seen it in our working directory!

Conclusion

Git has two sets of commands: Porcelain (high-level commands) such as git add, git commit, git remote, etc., and low-level Plumbing commands, which are used by higher-level commands to manipulate Git objects and references. We used these low-level commands to craft a commit by creating its underlying tree and blob objects.

References

Resources I've used to write this blog post:

  • Chacon, S., & Straub, B. (2014). Pro Git (2nd ed.). Apress.

Pro Git

You can read a more detailed version of what explained in this blog post at page 419 of Pro Git (2nd ed.). An online version is available at: Chapter 10 Git Internals - 2. Git Objects