This is actually a very good observation. Shame on these people saying it is just because. The spec isn't some magical tablet like the 10 commandments, that just appears into existence. It is extensively debated by several members of the community (mediated by W3), until there is some sort of consensus. Then the developers (mostly browser developers I imagine) will take the specs and implement. The interesting part is that the debate is open and public. So not only you can participate by giving your opinion on upcoming specs, but you can also search back and find out all the arguments that led to a consensus: https://lists.w3.org/Archives/Public/www-style/2004Oct/
I haven't read the whole discussion, but I can imagine that an implementation without commas would be harder and less predictable, since each background holds a set of properties. So for example, on the background shorthand it would be impossible to say where one background declaration starts and the other one ends, since there are no required properties. Like background: red url(img.png);
. Is that one bg or two? And this is definitely two: background: red, url(img.png);
That is just what pops up in my head, but if you dig into the discussion you will see that there are all sorts of considerations, like the effect on other APIs, like js, backwards compatibility, etc