What is a Version Control System (VCS)?

Before delving into Git, we must understand what a Version Control System is because Git is a type of VCS with its own unique concepts.

🇺🇸: Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.

When using VCS, you can revert to any state (provided it has been saved as a version) of one or more files, or even an entire project with X * 3.14 files. Additionally, VCS has many secondary functions such as avoiding source loss, tracking who last worked on a file, and sharing files with friendly colleagues working together (remember the days of carrying USB drives or uploading source files to Google Drive, nostalgic times 🙄).

Local Version Control Systems (LVCSs)

Since ancient times, people have managed versions by copying files to another folder and giving it a different name with "newer" than the name of the old folder (for example, poon-v1 to poon-v2). This method is common because of its simplicity, but it comes with risks such as not knowing which folder you are in, accidentally editing the wrong file, or not knowing what you are copying.

Amidst a thousand hairs hanging by a thread, true developers came up with a solution by creating a local database to store file changes and a "time travel" tool to move from one version to another (refer to RSC).

Centralized Version Control Systems (CVCSs)

Thinking that everything was fine with LVCSs, a new problem arose. We all know that a "cool" software cannot be created by just one person (except for hardcore freelancers outsourcing for "instant noodle" projects 😂). This requires a VCS that allows developers to collaborate to produce or develop a product. That's where CVCSs come into play.

Basically, CVCSs are not much different from LVCSs. As the name suggests, the database storing changes is built on a centralized server instead of being placed directly on the local machine.

In addition to solving the problems of LVCSs, CVCSs also bring some benefits such as helping everyone in the team understand what is happening in their project. The CVCS manager can know who did what and operate with the version database more easily than LVCSs (having to go to each machine one by one 🙄).

Although CVCSs have more advantages than LVCSs, they still have some limitations. The most noticeable one is the centralized storage mechanism; like any other centralized storage "genre," they all have a drawback that everything will collapse if the centralized server has issues. Regardless of the issue, from something small like temporary server outage or maintenance (at that time, developers can go for coffee or have a little drink 🤪) to something big like data loss, damaged storage hard drive, or some hacker breaking in,... and so on.

People have also added a backup mechanism over time to minimize the consequences of accidents. But the truth still points out that:

Whenever you have the entire history of the project in a single place, you risk losing everything.
-Anonymous-
(Even LVCSs are susceptible to this truth 👻)

Distributed Version Control Systems (DVCSs)

A "fancy" version to overcome the life-threatening limitations of CVCSs is Distributed Version Control Systems (DVCSs). Instead of just cloning each file (the latest version), DVCSs clone the entire version database from the centralized server to the local machine. From there, each local machine becomes a backup version for the centralized server. Whenever the centralized server has an issue, it can recover the data by fetching data from any local machine. The more local machines cloned, the more backup versions we have, reducing the risk of data loss.

Many tools are based on DVCSs today, such as Git, Mercurial, Bazaar, or Darcs.

What is Git?

Git is relatively similar to the VCSs introduced earlier. But wait, what makes Git different from the rest? That is its storage mechanism and the way it "handles" data entirely differently. Understanding this difference will help us better "understand" this Git friend.

Snapshots, Not Differences

As mentioned above, the biggest difference between Git and other VCSs is how Git handles data. Other VCSs store information about changes to files over time (usually called delta-based version control).

For Git, it's different. Git sees this data as a chain of snapshots of a small system. This means that every time you commit, Git will store the entire state of the files at that time and store a link to that snapshot. To enhance performance, if a file doesn't change, Git won't store that file but will instead link to the previous version of that file. This mechanism is often called a stream of snapshots.

Almost Everything is Local

Indeed, every operation in Git only requires local files and resources. We don't need to connect to the network to use Git (of course, it has already been cloned to the machine 🤣). Therefore, the speed on Git becomes excellent because we don't need to connect to a centralized server to review a file change from six months ago. Everything is local and always ready, yay 🤘

Data Integrity

Everything is checksumed before storage, and that checksum is used to reference. Hence, you cannot change the content of any file or folder without Git knowing. With the mechanism using the SHA-1 hashing algorithm based on content and directory structure, any discrepancies in data during transmission are detected by Git.

Here's how the hash looks like: 24b9da6552252987aa493b52f8696cd6d3b00373

Three States of Git

This is an important part to remember if you want to become Git's friend 💪

Modified: You have changed the file but haven't committed it to the database.
Staged: You have marked that you have changed this file for your next commit.
Committed: The data is safely stored in your local database.

Summary

These are the theoretical knowledge about Git that I have learned. I will share more of my insights into Git, such as how to use it, key concepts, etc.,