1

RFC 4180 states in page 2 that:

Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the record must not be followed by a comma.

So, per this standard, this would be invalid:

cat,dog,cow,

However, in theory it should represent a line of "cat", "dog", "cow" and "". So if adding a comma creates a new "last" element, the rule is actually never wrong. In fact, to respect "Each line should contain the same number of fields throughout the file." we'd need it in this case:

aaa,bbb,ccc,ddd
cat,dog,cow,

And indeed, some programs that export CSV do this for padding (ex.: Google Sheets).

Concluding, is the following the only right way to respect the standard?

aaa,bbb,ccc,ddd
cat,dog,cow,""

Or is the rule just wrong or redundant? Am I understanding this the wrong way?

Community
  • 1
  • 1
CanisLupus
  • 583
  • 5
  • 15

1 Answers1

1

The rule is not wrong at all, but it must be read very literarily: The last field must not be followed by a comma.

If the last element is empty, it is the last-but-one element, that is followed by the comma, which is perfectly fine.

So this is OK

a,b,c,d
x,y,z,
u,v,,
w,,,

but this is wrong

a,b,c,d
x,y,z,
d,e,f,g,

EDIT from the discussion

a,b,c,d,
e,f,g,h,
i,j,k,l,
m,n,o,p,

is also forbidden, according to the rule in question

Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
  • Thanks Eugen. But in this case isn't the last example simply wrong because "Each line should contain the same number of fields throughout the file." and not because of the comma rule? I may very well be reading too much (or not enough) into this, but since `record = field *(COMMA field)`, and `field = (escaped / non-escaped)` and either escaped or non-escaped use `*TEXTDATA`, an empty field is still a field. – CanisLupus Dec 08 '19 at 11:36
  • What I mean in that case is "is the rule actually needed"? Does it add anything? It seems to always be valid, since adding a comma automatically adds an empty element. So ok, it's not wrong, but also not needed either...? :) The rule "Each line should contain the same number of fields throughout the file." is the one that actually makes your last example fail. The comma actually adds a 5th element in the last line, so "The last field in the record must not be followed by a comma." was still not broken (it never is?) but it ends up breaking the other rule because of that same 5th element. – CanisLupus Dec 08 '19 at 11:52
  • It is needed only if **all** lines end in a comma, meaning an empty last column is not allowed. – Eugen Rieck Dec 08 '19 at 12:04
  • Ah, that makes a lot more sense and is the only reasonable explanation (for me) for this rule. Could you add that to your answer? The last example is invalid CSV because of other rules, not this one in particular. – CanisLupus Dec 08 '19 at 16:44
  • *also because of other rules – CanisLupus Dec 08 '19 at 17:16