3

Not sure about other implementations, but GCC employs an (apparently?) random use of spaces and tabs.

See for example this file: you'll see that it uses tab characters for indentation, but not everywhere, the reason being that the files use a two spaces indentation per level, but the tab character is inserted in place of a chunk of 8 spaces.

This means that the indentation an editor shows is dependent upon how many spaces an editor shows for each tab, which obviously forces the user to set up their editor in a such a way that tabs are consistent with spaces. By inspecting the linked file you'll see that for a good formatting, tab has to be 8 characters long.

Is there any reason why nobody ever runs a s/\t/ /g on the whole codebase?


Since I didn't really expect that the use __ was required, which it is, I'm asking this question just in case I'm missing something crucial so that the answer is not "because not everybody agrees that spaces are better than tabs".


Let me clarify one point: given a file generated like this

echo -en 'first line\n  second line\n\tthird line\n'

which has this content

first line
  second line
    third line # there's a tab and no spaces at the beginning of this ine 

no editor in the world, ever, knows what the correct way of showing this file is, because that depends on the convention. Stackoverflow seems to assume a tab is 4 spaces, but GCC codebase assumes a tab is 8 spaces.

It is a convention and, as such, it can be inconsistent between different codebases, and no editor is able to deduce our convention in a deterministic way. Given the file above, no editor knows if it has to show the third line indented with respect to the second line (thus guessing a tab is more than 2 spaces) or not (thus guessing a tab is exactly 2 spaces), unless the editor user communicates that information via options.

Clearly, each editor can apply some heuristic; for instance, if a file is a C++ source file and it contains these two lines

  if (true) // <space><space>
    std::cout << "bye"; // <tab>

the editor can be fairly confident that each tab is at least 3 spaces, to guarantee a minimum indentation to the second line with respect to the first; it could also deduce that the tab is at least 4 characters, applying the euristic that "nobody uses 1-space indenting"; but can it do more? Can it conclude that the tab is 4, 6, or 8 spaces? No, it can't. Full stop.

Enlico
  • 23,259
  • 6
  • 48
  • 102
  • 1
    There is a [bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28732) for this, which is declined. At least, since GCC11, [tabs are all 8 spaces](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86904). I can't seem to find a rationale for the coding convention to use tabs though. – cigien Mar 28 '22 at 19:09
  • As regards the comment _"tabs are all 8 spaces, so I wonder what your editor/debugger does here"_ at the first link is just nonsensical. If I have to step from one file (not from GCC) where tabs are 4 spaces to antoher (GCC) where tabs are 8 spaces, clearly my editor can't have two settings at the same time. Yes some path-specific setting is a possible solution, but that doesn't change that using tabs for indentation is simply the wrong thing to do. I hoped there was a rational reason for tabs, which I couldn't find. But there isn't. – Enlico Mar 28 '22 at 19:47

2 Answers2

2

I believe this originated as an "optimization" to reduce the size of the source file. By substituting a sequence of spaces with a tab character, the file displays just the same, but is up to 7 bytes shorter. Since source code tends to contain a lot of whitespace, this can add up to a substantial reduction.

Well, anyway, a reduction that would have been substantial in the early days of GCC (circa 1987), when 30 MB was a good-sized hard drive and RAM was over $100 per megabyte.

In particular, to this day, GNU Emacs formats files this way by default. And it's a good bet that much of GCC was written using GNU Emacs, given their common authorship...

Although the savings in file size are no longer very meaningful, the current GCC maintainers probably think it's fine and don't see a need to change it. And if you complain that your editor doesn't handle it properly, I'm sure they'll happily suggest another editor you could use :)

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • There is no editor that can handle that formatting properly without being told what the convention is, or applying some heuristic (which, given the size of the code base, could come up with a really accurate guess, I agree). And if the convention is changed, one has to tell the editor. – Enlico Mar 31 '22 at 07:49
  • But the second link of yours clearly answers my question, I believe. – Enlico Mar 31 '22 at 07:51
  • @Enlico: The Emacs default is documented as 8, and AFAIK in those days, 8 was pretty standard. Emacs also provided the solution of [file variables](https://www.gnu.org/software/emacs/manual/html_node/emacs/File-Variables.html), where the file itself contains a comment that tells Emacs how to format it. – Nate Eldredge Mar 31 '22 at 14:29
-1

This will happen with clang-format when IndentWidth differs from TabWidth.

Internally clang-format uses spaces, only at the end are spaces tabified. There are settings which spaces, if any, are converted - see UseTab in docs.

IndentWidth=2, TabWidth=8 will produce the observed formatting.

I am not familiar with gcc's git repository policy on formatting though.

Quimby
  • 17,735
  • 4
  • 35
  • 55