118

So, an absolute path is a way to get to a certain file or location describing the full route to it, the full path, and it's OS dependent (the absolute paths for Windows and Linux, for example, are different). A relative path, on the other hand, is a route to a file or location which is described from the current location .. (two dots) indicating a superior level in the directories tree. That has been clear to me for several years now.

When searching I've even seen that there are canonicalized files too! All I know is that CANONICAL means something like "according to the rules" or something.

Can somebody enlighten me in therms of theory about canonical stuff?

hippietrail
  • 15,848
  • 18
  • 99
  • 158
Metafaniel
  • 29,318
  • 8
  • 40
  • 67
  • 1
    @DaveNewton The reason why I asked about a "canonicalized" path was because using Aptana Studio 3,which tooltips commands, methods, etc, I read about `realpath`: "`PHP API.- realpath($path) @return` string the canonicalized absolute pathname on success" That's the reason I created this question, to understand clearly that sentence ;) – Metafaniel Aug 27 '12 at 15:46

5 Answers5

128

The whole point of making anything "canonical" is so that you can compare two things. For example, both ../../here/bar/x and ./test/../../bar/x may refer to the same location, but you can't do a textual comparison on the two paths. However, if you turn them into their canonical representation, they both become ../bar/x, and we see that they actually refer to the same thing.

In short, it is often the case that you have many ways of referring to one thing, and in that case you may be able to define a canonical representation which is unique and which allows you to get a handle on col­lections of such things.

(If you're looking for more examples, all of mathematics is full of "canonical" constructions for all sorts of objects, and very much with the same purpose in mind. Maybe this Wikipedia article can provide some ad­ditional directions.)

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 1
    Oooh I see now... So in short the canonical must be unique so it's the very full path with no relative use of the two dots. Ohhh! Your answer was helpful ;) Thanks – Metafaniel Aug 23 '12 at 21:45
  • @Metafaniel: Yes, the crucial part is uniqueness. I guess you can have both a canonical relative and a canonical absolute path, but that's an independent distinction. – Kerrek SB Aug 23 '12 at 21:46
  • 2
    All was good and right until that comment xD OK I've understood, but... a canonical relative path? As I've understand until now (thanks everyone) the canonical paths are absolute by nature. What can you tell me about the canonical relative paths? =S THANKS =) – Metafaniel Aug 23 '12 at 21:59
  • 3
    @Metafaniel: A canonical relative path would be relative to a given, fixed working directory. *Relative* to that given directory, you can form unique relative paths. But you can only compare those for the same working directory. By contrast, canonical absolute paths can be compared globally. – Kerrek SB Aug 23 '12 at 22:04
  • I had to read that comment several times to understand it clearly =/ xD Now it's clear enough! Thanks, you've been very kind ;) – Metafaniel Aug 23 '12 at 22:08
  • 1
    this doesn't work if you authorize hard or soft links on directories or files... In that case, defining a canonical path is much more difficult (see answer @alfasin). – Jean-Baptiste Yunès Aug 30 '16 at 14:56
  • How do you figure that ``../../here/bar/x`` is equivalent to `../bar/x`? – G-Man Says 'Reinstate Monica' Feb 21 '19 at 21:36
  • @G-Man: symlinks, I suppose. – Kerrek SB Feb 22 '19 at 01:32
  • The idea that relative paths can be canonical seems weak. They are clearly not unique then. I would say (based on the link) that at most you could say they are "normal" relative paths. – O'Rooney May 24 '22 at 04:20
43

A good way to define a canonical path will be: the shortest absolute path (short, in the meaning of string-length).

This is an example of the difference between an absolute path and a canonical path:

absolute path: C:\abc\..\abc\file.txt
canonical path: C:\abc\file.txt

Canonicalization is a type of normalization which allows an object to be identified in a unique way. A relative path cannot do it, by definition.

For more info:

https://en.wikipedia.org/wiki/Canonicalization

https://en.wikipedia.org/wiki/Canonical_form

Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
  • 1
    According to @KerrekSB the path must be unique to be canonical, so in your example I can see it: there's no other way to represent C:\abc\file.txt Thanks for the example =) – Metafaniel Aug 23 '12 at 21:46
  • Yes, you can define a `canonical path` as the shortest absolute path (short, in the meaning of string-length). I'll add it to my answer. – Nir Alfasi Aug 23 '12 at 21:48
  • I dunno. Many Windows machines may still support "short" 8.3 paths, in which case the shortest equivalent of "C:\Program Files" would be "C:\PROGRA~1", which I don't think is what most people would generally consider to be canonical. – Dan Korn Oct 19 '15 at 23:44
  • @DanKorn in Windows the path is not case sensitive and canonical is something that should be unique, so by definition, there is no canonical path in Windows ;) – Nir Alfasi Oct 20 '15 at 03:24
  • 4
    @alfasin: Microsoft would seem to disagree with your assertion that "there is no canonical path in Windows," as evidenced by the Windows API function [PathCanonicalize](https://msdn.microsoft.com/en-us/library/windows/desktop/bb773569(v=vs.85).aspx). And while it's true that Windows generally does not enforce case-sensitivity in paths (though it depends on the volume), Windows does indeed preserve the case of each character in a path, even if it allows access case-insensitively. At any rate, my point still stands, that the shortest possible absolute path is not necessarily canonical. – Dan Korn Oct 20 '15 at 20:47
  • 2
    @DanKorn tell them to argue with [the definition of canonical](https://en.wikipedia.org/wiki/Canonical). As for your disagreement, you might be right, but I still haven't seen a *good* counter-example ;) – Nir Alfasi Oct 20 '15 at 21:03
  • 2
    @alfasin: It seems to me that, dictionary definitions aside, Microsoft gets to define what their own terms mean in their own operating system and APIs. Who is a higher authority about what's canonical in Windows than they are? And why is "C:\PROGRA~1" not a good counter-example to your assertion that the shortest path is canonical by definition? – Dan Korn Oct 20 '15 at 21:53
  • 4
    @DanKorn with all due respect to microsoft (and I *do* respect them) they don't get to define what is the meaning of canonical. Canonical means "unique" or "unique representation". Since Windows OS is not case sensitive there cannot be a single unique representation of any path, by definition. It can be absolute, but not canonical. If this reasoning doesn't look reasonable to you, you can disagree, but since this discussion is becoming futile (I keep repeating & explaining my words) let's cut it here and agree to disagree. – Nir Alfasi Oct 21 '15 at 00:45
  • 1
    The best answer – SuB Feb 02 '20 at 06:39
  • Considering the (much more upvoted) answer by @kerrek-sb says that canonical paths can be relative, this answer is either wrong or at least confusing. It would be good to update it to address the question of relative vs absolute. – O'Rooney May 23 '22 at 22:28
  • @O'Rooney I disagree with the "much more upvoted answer": based both on the [link he posted](https://en.wikipedia.org/wiki/Canonical_form) and on [this](https://en.wikipedia.org/wiki/Canonicalization): canonicalization is a type of normalization which allows an object to be identified in a unique way. A relative path cannot do it, by definition. Upvotes are not everything my friend :) – Nir Alfasi May 24 '22 at 00:23
  • 1
    Sure, I agree, upvotes are not everything. Perhaps explaining that "relative paths cannot be unique by definition" would help clarify your answer and that it explicitly conflicts with the other one :) – O'Rooney May 24 '22 at 04:18
8

A good definition of a canonical path is given in the documentation of readlink in GNU Coreutils. It is specified that 'Canonicalize mode' returns an equivalent path that doesn't have any of these things:

  1. hard links to self (.) and parent (..) directories
  2. repeated separators (/)
  3. symbolic links

The string length is irrelevant, as is demonstrated in the following example.

You can experiment with readlink -f (canonicalize mode) or its preferred equivalent command realpath to see the difference between an 'absolute path' and a 'canonical absolute path' for some programs on your system if you are running linux or are using GNU Coreutils.

I can get the path of 'java' on my system using which

$ which java
/usr/bin/java

This path, however, is actually a symbolic link to another symbolic link. This symbolic link chain can be displayed using namei.

$ namei $(which java)
f: /usr/bin/java
 d /
 d usr
 d bin
 l java -> /etc/alternatives/java
   d /
   d etc
   d alternatives
   l java -> /usr/lib/jvm/java-17-openjdk-amd64/bin/java
     d /
     d usr
     d lib
     d jvm
     d java-17-openjdk-amd64
     d bin
     - java

The canonical path can be found using the previously mentioned realpath command.

$ realpath $(which java)
/usr/lib/jvm/java-17-openjdk-amd64/bin/java
tlake29
  • 121
  • 2
  • 5
7

What a canonical path is (or its difference from an absolute path) is system dependent.
Typically if a (full) path contains aliases, shortcuts or symbolic links the canonical path resolves all these into the actual directories they refer.
Example: if /bin/a is a sym link, you can find it anywhere you request for an absolute path e.g. from java.io.File#getAbsolutePath while the real file (i.e. the actual target of the link) i.e. usr/local/bin/a would be return as a canonical path e.g. from java.io.File#getCanonicalPath

Cratylus
  • 52,998
  • 69
  • 209
  • 339
  • Quite useful your comment for me to understand too! I haven't even thought on sym links right now! So a sym link it's absolute BUT not canonical... Very comprehensive. HOWEVER I've got a new doubt... What about hard links? Aren't they absolute enough? THANKS – Metafaniel Aug 23 '12 at 21:50
  • Depends.For example java.io.File#getCanonicalPath does not resolve hard links – Cratylus Aug 23 '12 at 22:08
  • 1
    I haven't developed so much in Java, I'm mainly a PHP guy ;) Thanks for clarifying =) – Metafaniel Aug 23 '12 at 22:09
  • re. hard links: there's no such concept as resolving a hard link to a canonical form. If two files are hard linked to each other, they both point to the same data on disk, so there is no longer any canonical form, just two identical files whose data actually occupies the same space on disk. Which of the files was linked to the data on disk first is not relevant to the concept of canonical naming. See http://stackoverflow.com/questions/185899/what-is-the-difference-between-a-symbolic-link-and-a-hard-link – Connie Dobbs Feb 12 '14 at 17:11
2

The most issues with canonical paths occur when you are passing the name of a dir and not file. For file, if we are providing absolute path that is also the canonical path. But for dir it means omitting the last "/". For example, "/var/tmp/foo" is a canonical path while "/var/tmp/foo/" is not.