Resolving symbolic links algorithm

Question

What should the algorithm for resolving symlinks on Linux look like?

Something like:

Split path to steps /usr/bin/hello -> ['usr', 'bin', 'hello']
First resolve /usr -> /something1
Add next step and resolve /something1/bin -> /something2
Add next step and resolve /something2/hello -> /something3

Will that work?

By "resolving", do you mean forming a path to the same file that contains no symlinks? In that case, you have to accommodate several additional details, including (1) a symlink can point to another symlink; (2) a symlink can point to a multi-component path (any of whose components can be symlinks); (3) symlinks can be either absolute or relative; (4) symlinks can point to a non-existent path. — John Bollinger, Aug 12 '17 at 19:57
Yes, that's possible. This is why [`realpath()`](http://man7.org/linux/man-pages/man3/realpath.3.html) can fail with `ELOOP`. — melpomene, Aug 12 '17 at 20:05
`path_resolution(7)` contains information on how paths are resolved, including symbolic links. See also https://unix.stackexchange.com/a/99383/94871 — Pedro Gimeno, Oct 21 '18 at 19:43

score 2 · Answer 1 · answered Aug 12 '17 at 20:16

What you are actually looking for is readlink command, that relies on POSIX realpath. Its algorithm is available here

As written in one book, the idea is this:

All path type resolution (checking) processing uses the presence or absence of a leading slash (/) to indicate whether the path is an absolute or relative path. If the slash is present, the first qualifier after the slash is compared against the MVS prefix to determine if it matches the prefix. If so, then the path type will be considered to be explicitly resolved via the prefix. If no match is found, or no slash was present, the implicit path type resolution heuristic is used.

Some details are also available here

GIZ · Answer 2 · 2017-08-13T17:57:43.203

Basically when you request an I/O, the kernel has to go through a series of steps. The kernel needs to to search directories for the requested file, this isn't a problem because the kernel always knows from where to start because the root file has a constant inode number, it's inode 2 in ext family of filesystems. The kernel then converts the filename to an inode number once it locates the filename in a directory. Because each directory is just a special kind of file which holds entries each entry with (filename, inode) fields, by searching directories the kernel will be able to locate the file's inode.

Once the kernel finds the inode of a file, this inode holds the block addresses for a regular file and thus will be used to located the data stored in that file. Block addresses of a file hold the actual data that are stored in the file. *The difference between a regular file and symlink file is that, the symlink file is a file that points to another location and thus the kernel has to perform the same series of steps twice, that is, when the inode of a symlink file is found the kernel has to redo the same operation for the filepath that the symlink file points it, it has to search in directories and find a matching filename in a directory in order to get the inode number. This obviously adds an overhead.

A recursive (a.k.a cyclic) symlink is an invalid symlink.

Not sure if I've answered your question, but that's what generally happens, you also have the VFS layer on the top and below that is the physical filesystem. Some filesystems don't even support symlinks, like vfat.

Resolving symbolic links algorithm

2 Answers2