20

I've seen a lot of the minimal requirements that an ANSI C compiler must support like 31 arguments to a function, and most of the numbers seem to make some kind of sense.

However, I cannot see the reasoning for supporting at least 509 characters in a source line. 511 or 512 would make more sense, but 509 seems kind of arbitrary.

What is the reason for this number?

Ryan Haining
  • 35,360
  • 15
  • 114
  • 174
  • 6
    See also: http://stackoverflow.com/questions/11488616/why-is-max-length-of-c-string-literal-different-from-max-char. While the question itself isn't a duplicate, some of the answers and comments there are. – mbauman Oct 14 '13 at 17:04
  • @Ryan Haining, are any of these "ANSI C compiler" requirements concerning 509 from compilers made in the last 10 years? – chux - Reinstate Monica Oct 14 '13 at 17:51
  • 1
    @Ryan Haining: "ANSI C" dates from 1989. Given the state of computers at that date (1 meg in a pc was finally getting not-rare), one can imagine the struggle between memory-conserving compiler writers and C standards that declare an absolute minimum of some sort for every parameter. That standard being off by a CR/LF pair is a perfectly good explanation. 4095 ...I'd have to check my code but I don't think I ever wanted to go over that minimum of 509 characters. – Jongware Oct 14 '13 at 20:06
  • Do 509 char long lines pass code reviews? – mouviciel Oct 16 '13 at 04:54
  • Possible duplicate of [Why is max length of C string literal different from max char\[\]?](https://stackoverflow.com/questions/11488616/why-is-max-length-of-c-string-literal-different-from-max-char) – Evan Carroll Jul 31 '18 at 18:58

4 Answers4

17

This perhaps is to take account of possible CR + LF + '\0' characters and have a string representation of each line still fit into 512 bytes of memory.

Digital Trauma
  • 15,475
  • 3
  • 51
  • 83
  • You sound very certain - do you have a citation? I can't find anything else on the topic. Given that C99 raised the limits for string literals and source lines to 4095, it makes this choice seem more arbitrary. Or at least, it seems like one of the two choices (2^n-1 or 2^n-3) is arbitrary. – mbauman Oct 14 '13 at 17:36
  • 2
    @MattB. - no citation - sorry :(. This is just based on educated guesswork. Perhaps by the time C99 came along, it was realized there is no need to store any `CR` or `LF` characters in a string representation of each line - More guesswork, I'm afraid. – Digital Trauma Oct 14 '13 at 17:39
3

The C11 dr 5.2.4.1 limits are different than given by the OP. I suspect they come from C89.

4095 characters in a logical source line

4095 characters in a string literal (after concatenation)


[Edit] @jwodder suggested a more complete answer was needed.

Best I can provide: 512 bytes was the most common sector size for floppy, diskette and hard drive media circa mid 80 to mid 90s and likely contributed, along with @bizzehdee & @DigitalTrauma thoughts as to the curious 509 limit.

It was a very popular buffer size.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 1
    This does nothing to explain *why* the limit is the number it is. – jwodder Oct 14 '13 at 22:59
  • @jwodder I agree the answer does not explain why the limit was 509 chars 24 to 14 year ago. The OP had no problem with values like 511 or 512 and so by using up-to-date values would seemingly equally have no trouble with 4095 - and thus no explanation was needed for current values. – chux - Reinstate Monica Oct 14 '13 at 23:07
  • Well, since we're talking about memory blocks, then the terminating `\r\n` hypothesis becomes less relevant (if you have a sized block terminating characters can be added back as needed), but you *do* need two octet bytes to describe numbers [0, 512), and you probably want to keep the `\0` for the C code that works better with null termination. – mtraceur Oct 21 '21 at 09:42
2

straight from this question

Perhaps 509 is intended to allow for a 512-byte buffer with two bytes for a "\r\n" line terminator and one for a '\0' string terminator.

Community
  • 1
  • 1
bizzehdee
  • 20,289
  • 11
  • 46
  • 76
0

I have no source, but I thought it's the two " characters and the \0 character that make up these 512 characters. I don't think that the 2 characters are for CRLF for 2 reasons: These are not default characters you must write there, and for LINUX it's only LF. That's why I say it's the two " characters.

radl
  • 300
  • 3
  • 12