📝 Git & GitHub

How Git stores data 💾

Author

Pyland

📅

Published

06.05.2026

⏱️

Reading time

3 min

👁️

Views

154

🌿

Level

Medium

#git

Have you ever wondered how Git works so fast? How it stores the entire project history while taking up so little space? Let’s look under the hood!

The magic of Git: snapshots, not diffs

Most version control systems store changes (deltas):

File v1: "Hello"
Change 1: +6 characters " World"
Change 2: +1 character "!"

Git works differently. It takes snapshots of the entire project:

Commit 1: full project snapshot
Commit 2: full project snapshot (with changes)
Commit 3: full project snapshot

How does it work?

1. Hashing (SHA-1)

Git converts every file into a unique hash (checksum):

# File: hello.txt contains "Hello World"
# Git computes the SHA-1 hash:
557db03de997c86a4a028e1ebd3a1ceb225be238

If the file hasn’t changed — the hash is identical!
If even one character changed — a completely different hash.

2. Git objects

Git stores 4 types of objects:

1. Blob (Binary Large Object)
- The file contents
- Pure data, no filename attached

2. Tree
- A directory in the filesystem
- A list of files (blobs) and subdirectories (trees)

3. Commit
- A snapshot of the project at a point in time
- Points to a tree
- Points to the parent commit
- Contains author, date, message

4. Tag
- A named label for a commit
- For example, “v1.0.0”

Example: how Git stores a commit

Consider a simple project:

my-project/
├── README.md
└── src/
    └── main.py

What Git creates:

BLOB for README.md
  hash: abc123...
  content: "# My Project\n..."

BLOB for main.py
  hash: def456...
  content: "print('Hello')"

TREE for src/
  hash: ghi789...
  main.py -> def456...

TREE for root
  hash: jkl012...
  README.md -> abc123...
  src -> ghi789...

COMMIT
  hash: mno345...
  tree: jkl012...
  parent: previous commit
  author: "Vasya <vasya@example.com>"
  date: "2026-04-10 15:00:00"
  message: "Add README"

Saving space: deduplication

The clever part: if a file hasn’t changed between commits, Git does NOT create a new copy!

Commit 1:
  README.md -> blob abc123

Commit 2 (only main.py changed):
  README.md -> blob abc123 (THE SAME blob!)
  main.py -> blob xyz999 (new blob)

Result: massive storage savings!

Compression and pack files

Over time, Git additionally compresses objects into pack files:

Similar files are compressed together
Older versions of files are stored as deltas (diffs)
This happens automatically

Advantages of Git’s approach

✅ Speed

All operations are local:
- Viewing history — instant
- Switching branches — seconds
- Comparing versions — fast

✅ Integrity

Every object is identified by its hash:
- Impossible to alter the past without detection
- Any data corruption is caught immediately
- History is cryptographically protected

✅ Compactness

Thanks to deduplication and compression:
- Many versions of files take up little space
- You can store the full project history

✅ Distribution

Every clone is a complete copy:
- Full project history
- All branches
- All tags

Where does Git store data?

Everything lives in the .git/ directory:

.git/
├── objects/      # Blob, tree, commit objects
├── refs/         # Pointers to branches and tags
├── HEAD          # Current branch
├── index         # Staging area
└── config        # Configuration

Practical example

# Create a file
echo "Hello Git" > test.txt

# Add to staging
git add test.txt

# Git created a blob object!
# You can inspect its contents:
git cat-file -p abc123...

# Make a commit
git commit -m "Add test"

# Git created:
# - blob for test.txt
# - tree for the root
# - commit object

Interesting facts

🔍 SHA-1 collisions:
- Theoretically possible
- In practice, the probability is negligible
- Git is transitioning to SHA-256

📦 Size of the .git directory:
- Typically 10–30% of the project size
- Linux kernel: ~3 GB of code, ~1.5 GB of .git
- 20+ years of history in just 1.5 GB!

🚀 Speed:
- git log — instant (local database)
- svn log — seconds (server request)

Takeaways

Git is smart because:

✅ It stores snapshots, not deltas
✅ It uses hashing for identification
✅ It deduplicates unchanged files
✅ It compresses data automatically
✅ It works locally (fast!)

Similar articles

06.05

What Is a Git Commit and Why Do You Need It? 📸

← Previous article

Common Git Mistakes Beginners Make

📝 06.05.2026

Next article →

Setting Up Two-Factor Authentication on GitHub 🔒

📝 06.05.2026

How Git stores data 💾

The magic of Git: snapshots, not diffs

How does it work?

1. Hashing (SHA-1)

2. Git objects

Example: how Git stores a commit

What Git creates:

Saving space: deduplication

Compression and pack files

Advantages of Git’s approach

✅ Speed

✅ Integrity

✅ Compactness

✅ Distribution

Where does Git store data?

Practical example

Interesting facts

Takeaways

Your reaction to the article

Common Git Mistakes Beginners Make

Setting Up Two-Factor Authentication on GitHub 🔒

💬 Comments (0)

No comments yet

Similar articles

Your First Git Commit

Git Hosting Platforms

What Is a Git Commit and Why Do You Need It? 📸

Did you like the article?