1

The file extension is typically everything after the last period. If a filename has no ".", it has no extension. What happens when the filename begins with a dot, as hidden files in linux do?

In python, the file has no extension...

>>> os.path.splitext("base.ext")
('base', '.ext')
>>> os.path.splitext(".ext")                                                                                              
('.ext', '')

The common method in bash produces the other result where there is only an extension and no base part (Extract filename and extension in Bash)...

>>> filename=".ext"
>>> extension="${filename##*.}"
>>> base="${filename%.*}"
>>> echo $base

>>> echo $extension
ext

How should code handle filenames such as this? Is there a standard? Does it differ per operating system? Or simply which is most common/consistent?

[EDIT]
Lets say you have a file that's just ".pdf". Should, for example, an open dialogue default to listing it without 1. showing hidden files and 2. allowing all file extensions?

  1. It's a hidden file - it begins with a period
  2. Is it actually a .pdf (by filename convention, sure it has pdf data) or is it a file witn no extension?
Community
  • 1
  • 1
jozxyqk
  • 16,424
  • 12
  • 91
  • 180

1 Answers1

1

File extensions in POSIX-based operating systems have no innate meaning; they're just a convention. Changing the extension wouldn't change anything about the file itself, just the name used to refer to it.

A file could have multiple extensions:

source.tar.gz

Sometimes a single extension represents a contraction of two:

source.tgz

Other files may not have an extension at all:

.bashrc
README
ABOUT
TODO

Typically, the only thing that defines an extension is that it is a trailing component of a filename that follows a non-initial period. Meaning is assigned by the application examining the file name. A PDF reader may focus on files whose names end with .pdf, but it should not refuse to open a valid PDF file whose name does not.

Note that

extension="${filename##*.}"

is simply an application of a parameter expansion operator which only returns the (final) extension if the filename does not start with a period. It's not an extension operator, it is a prefix-removal operator.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • Thanks! I'm aware of these, please see the edit. In the case of `.bashrc`, it's pretty obvious that's it's the main filename, not an extension. What about `.pdf`? Going by the same rule, it should be said to have *no* extension. – jozxyqk Nov 20 '13 at 15:55
  • Whether or not there is any required connection between the contents of a file and its extension(s) (if any) is up to the operating system. `.pdf` *seems* like an empty file name with an extension because you are familiar with the use of `.pdf` to indicate a PDF file. Ask yourself what `.udn` is: is it simply a file that starts with a period, or an extension for some data type you aren't familiar with? This is why a good dialog box allows you to display all files, or give a user-specified filter, because there is no hard rule for how files must be named. – chepner Nov 20 '13 at 15:59
  • This is exactly my point. If an open file dialogue is built with an internal split-extension function, which should the split-extension choose? I don't think it should be context sensitive. – jozxyqk Nov 20 '13 at 16:02
  • File extensions are by definition context-sensitive. `.pdf` might be intended to be a hidden file; it might be a poorly named PDF file. There's no way to tell. Even the leading-period thing is just a convention; there's nothing special about the file itself. Programs like `ls` simply ignore them unless instructed otherwise. – chepner Nov 20 '13 at 16:17
  • One final comment: a period doesn't even necessarily indicate an extension: OS X plist files have names like `com.apple.ActivityMonitor.plist`. Only the `plist` component is considered an extension; everything preceding it is a dotted file name which is interpreted as a hierarchical description of who is responsible for creating the file and which program uses it. – chepner Nov 20 '13 at 16:18
  • Agreed. But as you say, programs like `ls` ignore them by default. So the creators have set a convention (which is not context sensitive - `.` files are ignored. always), which I'd prefer to follow. In this case it seems more and more appropriate to ignore an initial `.`, if it exists, for filename extension parsing (at least on linux). – jozxyqk Nov 21 '13 at 04:42
  • I'd like a link to make sure that everybody is conventionally aware that initial dots in filename does not start the extension. In my sense, bash behavior is what to expect. `.bashrc` has no base name, and is a filename that is basically, just an extension. How does boost::filesystem copes with that ? – v.oddou Nov 26 '13 at 07:23