From Root to Canopy : Discovering the Linux File System
A Discovery article on the Linux File System and its essential parts
Climbing the Linux File Tree branch by branch :
The root '/' : where the tree originates
Everything in Linux originates from the root '/' directory. Everything Linux operates on like hardware devices, network sockets, system configs, system processes, etc. are treated as regular files and branch of this single origin point called the root '/'. All paths within the system are relative to the root.
In this article, The Ubuntu 24.04.3 LTS Linux distribution installed on Windows using WSL ( Windows Subsystem for Linux ) is used to show various Linux commands and explore the file system.
Whenever a Linux system is booted, the user is inside the user's home directory which is referred by the '~' symbol. cd / is used to go back to the root directory then the ls command lists all the directories that branch of the main root directory.
/bin : Binaries that run the system
One of the most essential directories of a Linux system, The /bin include all the essential user binaries available to all users. It includes all commands like ls, cp, mv, cat, mount, grep, etc. that must work even if nothing else is mounted to the root.
These commands are not the scope of this article, so we will discuss these in a future article.
/bin looks like this if we use ls inside it.
Interesting finds on /bin
In modern Linux, the
/binis a symlink ( Symbolic Link, a shortcut to another directory ) pointing to/usr/bin./binused to be a separate directory, but the historical hardware limitation (small disk size) does not exist anymore and now/binis moved to/usrand a symlink is listed in the root for backwards compatibility.
Notice the "bin" ( light blue ) followed by "-> usr/bin", that's a symlink.
/sbin : The admin binaries
Like /bin directory, /sbin also contains binaries that are essential for the system. But these binaries are for administrator commands like reboot, mkfs, ip, etc. Most regular users cannot run these binaries due to system permissions.
/sbin is also a symlink to /usr/sbin on modern Linux distros.
We can spot ip, reboot and poweroff when we ls inside /sbin
/boot : Essentials for system boot
This directory contains all the boot files that the bootloader ( program the loads the OS into the memory for execution ) requires. It comprises of two of the most critical components "vmlinuz" and "initrd", which are essential in the system boot process.
vmlinuz : The actual Linux Kernel
The
/bootdirectory also contains the actual executable file of the Linux Kernel in a compressed form to save space.vmlinuz stands for Virtual Memory LINUx gZip.
initrd : Initial RAM disk
initrd or "Initial RAM disk" is a temporary root file system used until the real root filesystem is ready to mount and use. The Bootloader also loads the initrd along with vmlinuz to carry out the system boot process.
In modern Linux distros like Debian, initrd is replaced by the more faster and more efficient "initramfs" ( Initial RAM file system ) . Unlike initrd which is mounted as a device ( specifically a RAM disk), initramfs is a "tmpfs" ( Temporary File System ) and is directly loaded in the kernel memory which makes it quite fast.
Curious behavior in WSL
Also, in the case of WSL ( Windows Subsystem for Linux) there isn't a traditional Linux bootloader which is used in actual Linux distributions. WSL uses a special kernel which is managed by Microsoft Windows.
When we try to use the ls command inside the /boot directory on WSL, its often empty because Virtual Machine based Linux Systems do not use traditional linux boot files.
/dev : The branching begins !!
The /dev directory lists all physical devices ( storage disks, mice, etc. ) and virtual devices ( terminals, random number generators, etc. ) as separate files called device nodes, adhering to "Linux treats everything as a file" paradigm.
/dev is also a temporary or virtual file system residing on the RAM, more accurately a "devtmpfs" ( Device Temporary File System ). Its mounted by the kernel during system boot, whenever a device is connected or disconnected the kernel adds/removes the device node accordingly.
udev, which is a user space ( the part of the system accessible to the user, outside of the kernel space ) application that handles all the permissions, device events and creation of symlinks. It acts as a management layer that modifies and organizes the device nodes after the kernel has done populating them in the devtmpfs.
Interesting findings on /dev
- Lets look at how the
ls -laoutput of the/devdirectory looks like
$ ls -la /dev
Scrolling down a bit, we come across
sdaandttymarked by the red arrows.sdarefers to the first SATA disk device andttyrefers to the controlling terminal for the current process. Why am i mentioning this here ? Due to the presence of Block and Character device types.The first character in the device node's permission string ( the "b" in the "brw-rw----" of
sda) tells about the device type.Block devices : In these devices data is read and written in fixed-size blocks. You can seek to any block as these support random access and also have an internal buffer. Disks ( like
sda) are an example of Block type devices.Character devices : For Character devices, data is read and written as a stream of bytes. Access is sequential and these do not have an internal buffer.
Audio devices, Terminals (tty) and special files like/dev/randomare examples of Character type devices.You may also notice a pair of whole numbers ( 8, 0 in case of
sda) separated by commas after the ownership details, these numbers are the major and minor numbers that the kernel uses to accurately identify which kernel driver is assigned to which device instance on the system.The number identifying the Kernel driver is the Major number while the number for identifying the specific device instance is the Minor number.
On the note of interesting findings, lets look at some special device files inside the /dev directory.
/dev/null
The /dev/null file has a special feature, any data written to it is immediately discarded and any reads to this file will return the EOF ( End-of-File ) character acting like an empty file. Among Linux users, this file is often called the "black hole" of the system.
One of the use cases of this file can be : to create empty files without using the touch command.
$ cp /dev/null emptyfile
We use the cp command to copy the contents of the /dev/null ( which is just the EOF character ) to a new file named "emptyfile" in the root directory.
/dev/zero
The /dev/zero file is the opposite to the "black hole" /dev/null file. Like the latter all writes are discarded but any reads to the file return an infinite stream of "zero bytes". As the name suggests, a "zero byte" is a byte in which all the bits are set to zero ( denoted as \0 in code ) .
This special device file is mostly used to :
Create dummy files of a fixed-size
$ dd if=/dev/zero of=test.txt bs=1M count=1024The above command uses the "data duplicator" (
dd) command. It takes/dev/zeroas input file and creates a new file namedtest.txtof size 1 GB consisting of all zero bytes.
Below you can seetext.txtshows up when wels -lainside the root directory.
Wipe already existing disk partitions
The following command needs root user privileges and should be used with caution.
Notice thatsda1( a partition of the main SATA disk ) is selected to be wiped. BE VERY CAREFUL , we only want a partition to be wiped ( onlysda1) not the entire disk which can be fatal to the system.$ sudo dd if=/dev/zero of=/dev/sda1 bs=1M status=progress
/etc : Only text files allowed !!
/etc contains all the configuration files for the system to function. There is one special rule that /etc follows : only static text based configuration files are allowed inside of /etc.
Interesting files inside /etc
/etc/passwd
It may seem that this file is for storing passwords, but actually this file stores all user account information on a particular Linux system. As mentioned above this file is purely text based and can be easily read by all users on the system due to it being "world-readable" ( can be read by anyone, no privileges needed )
root:x:0:0:root:/root:/bin/bash
zeal:x:1000:1000:zeal_user,,,:/home/zeal:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
"root" and "zeal" are actual user accounts stored in this unique string format. While "daemon" is a system account which runs background tasks without giving them any root administrative privileges.
The special string in which user accounts are stored looks like this :
7 colon-separated fields
username:x:uid:gid:comment(GECOS):home_directory:login_shell
username : As the name suggests, login name of the user
x : The "x" stands for hashed password for the user
uid : The user id. 0 is root, 1-999 is system users like daemon and 1000+ for actual users
gid : Group id for the group the user belongs to.
comment (GECOS) : A generic comment field, can be anything describing the user. Its named after an acronym "GECOS"
home_directory : The absolute path to the users home directory
login_shell : The login shell of the user. You may notice
/usr/sbin/nologinin the case of "daemon" , those are basically service accounts on which logins are prohibited.
/etc/shadow
Remember the "x" in the /etc/passwd file ? That "x" was a reference to the hashed passwords for the user accounts, those hashed passwords are actually stored here inside /etc/shadow.
zeal:\(6\)abc123$XYZhashedpassword:19820:0:90:7:14:20000:
9 colon-separated fields
username:hash:last_changed:min_age:max_age:warning:inactive:expire:reserved
hash : complete hashed string,
\(6\)states that the "SHA-512" algorithm is used to hash.\(abc123\)is the salt ( a text string used in cryptographic hashing ) and$XYZhashedpasswordis the actual password.last_changed : days since the last password change.
min_age : minimum days before a password needs to be changed.
max_age : maximum days before a password needs to be changed.
warning : number of days before expire to warn user for password change.
inactive : number of days after the expiry, until the user account is disabled.
expire : the actual expiry date of the password.
reserved : this field is reserved for future use.
/etc/hosts and /etc/resolv.conf
Both of these files are used for DNS resolution. /etc/hosts is the local lookup table consisting of ip address - hostname pairs, if the /etc/hosts does not have the required answer /etc/resolv.conf tells the system where to send the DNS query on the global DNS hierarchy.
/etc/hosts
127.0.0.1 localhost
127.0.0.2 home.dev_server
127.0.0.3 work.dev_server
.... ....
.... ....
/etc/resolv.conf
nameserver 1.1.1.1
nameserver 8.8.8.8
search dev_server
/etc/hostscontains the ip-hostname pairs, like 127.0.0.1 - localhost./etc/resolv.confcontains which DNS nameserver to query if local look-up fails. There's also a search directive which lets the system to add suffixes to short names, for eg. ifping homeis used, the system will also tryping home.dev_serverbefore it fails./etc/resolv.confis a "run-time" file and is managed by the NetworkManager orsystemd-resolveddaemon, usually any manual changes made to this file are overwritten by the daemon. Any changes required need to be done through the daemon's configuration files.
This chain of first checking for the DNS resolution locally then checking for which DNS nameserver to send the query if local resolution fails, is defined by another /etc file called the /etc/nsswitch.conf
Conclusion
Honestly, this discovery article made me rummage through a ton of resources available on Linux and learn a lot on the Linux File System. I wish i could keep going and cover all the interesting and quirky directories of the Linux File System, but its simply not possible in one article.
The Linux Filesystem Hierarchy is a unique piece of logical organization. By treating everything that an OS operates on ( hardware devices, network sockets, system configs, system processes, etc.) as a file, Linux provides a level of transparency and control that remains unmatched.
Keep an eye on upcoming articles covering the left out directories like the very essential and unique /proc directory which keeps track of the processes running on the system and is generated in real-time by the kernel.
