23

I understand that semicolons indicate the end of a line in languages like Java, but why?

I get asked this a lot by other people, and I can't really think of a good way to explain how it works better than just using line breaks or white space.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
BlueThen
  • 271
  • 1
  • 2
  • 7

7 Answers7

22

They don't signal end of line, they signal end of statement.

There are some languages that don't require them, but those languages don't allow multiple statements on a single line or a single statement to span multipile lines (without some other signal like VB's _ signal).

Why do some languages allow multiple statements on a line? The philosophy is that whitespace is irrelevant (an end of line character is whitespace). This allows flexibility in how the code is formatted as formatting is not part of the semantic meaning.

Tergiver
  • 14,171
  • 3
  • 41
  • 68
  • Ah, yes. I do know some languages that still allow multiple statements on one line, but by separating with a special character: statement1 | statement2 | statement 3, and I know some that allow one statement across multiple lines: statement 1/3 + statement 2/3 + statement 3/3 (comments don't allow multiple lines apparently, so just imagine this was 3 separate lines!) – BlueThen Jan 15 '11 at 18:46
  • 3
    Well, you could make the semicolon optional and use it if you want to have multiple statements in one line. This is what JavaScript does. – Tim Büthe Mar 26 '13 at 10:58
20

First of all, the semicolon is a statement separator, not a line separator. Some languages use the new line character as statement separator, but languages which ignore all whitespace tend to use the semicolon.

Why do languages ignore whitespace?

A language ignores whitespace to allow the programmer to format the source code as he likes it. For example, in Java there is no difference between

if (welcome)
    System.out.println("hello world");

and

if (welcome) System.out.println("hello world");

This is not because there is one separate case for each of these in the grammar of the language, but because the whitespace is simply ignored.

Why does a programming language need a statement separator?

This is the core of the question. To understand it, let's consider a small language without any statement separator. It contains the following statement types:

var x = foo()
y[0, 1] = x
bar()

Here, y is a two-dimensional array and x is written to one of the entries of y.

Now lets look at these statements like the compiler would see them:

var x = foo() y[0, 1] = x bar()

Because there is no statement separator, the compiler has to recognize the end of each statement by itself, to make sense of the input. Is the compiler able to do so? I guess in the above example the compiler can do it.

Now, lets add another type of statement to out language:

[x, y] = ["hello", "world"]

The multi assignment allows the programmer to assign multiple values at once. After this line, the variable x will contain the value "hello" while the variable y contains "world". This might be really handy to allow multiple return values from a function. Now how does this work together with the remaining statement types?

Consider the following sequence of statements:

foo()
[x, y] = [1, 2]

First, we call the method foo. Afterwards, we assign 1 to x and 2 to y. At least this is what we meant to do. Here is what the compiler sees:

foo() [x, y] = [1, 2]

Is the compiler able to recognize each statement? No. There are at least two possible interpretations. The first is the one we intended. Here is the second one:

foo()[x, y] = [1, 2]

What does this mean? First, we call the method foo. This method is supposed to return a two-dimensional array. Now, we write the array [1, 2] at the position [x, y] in the returned array.

The compiler cannot recognize the statements, since there are at least two valid interpretations of the given input. Of course, this should never happen in a real programming language. In the given example, it might be easy to resolve, but the point is that it is hard to design a programming language without a statement separator to be not ambiguous. It is hard, because the language designer has to consider all possible permutations of statement types to be sure the language is not ambiguous.

Thus, the statement separator helps the language designer to initially design the language, but more importantly it allows the language designer to easily extend the language in the future, for example by adding new statement types. This is a big thing, since once code is written in your language, you cannot simply change the grammar for existing statement types, because this will cause all the existing code to not compile anymore.

TL;DR

Summing it all up, the semicolon was introduced as statement separator in whitespace ignoring languages, because it is easier to design and extend a language which has a statement separator.

Stefan Dollase
  • 4,530
  • 3
  • 27
  • 51
8

Many languages allow you to put as much spacing as you like. This allows you to be have control over how the code looks.

Consider:

 String result = "asdfsasdfs"
               + "asdfs"
               + "asdfsdf";

Because you are allowed to insert extra newlines you can split that line across several lines without problem. The language still needs to know the line is finished that is why you need a semicolon.

Winston Ewert
  • 44,070
  • 10
  • 68
  • 83
  • 2
    JavaScript allows this with optional semicolons, so they are not needed per se – Tim Büthe Mar 26 '13 at 11:00
  • 1
    If you put the + sign at the end of each line then the syntax could parsed easily without the use of semicolon. – Bernard Igiri Feb 27 '15 at 19:20
  • 1
    Right - and then you have imposed arbitrary layout restrictions so that you can sometimes omit semicolons. To me, that is a net loss in usability. –  Apr 28 '19 at 21:36
1

The languages do it, as it signifies the end of a statement, not an end of the line, which means that you can compress code, to make it smaller and take up less space.

Take the C++ code (#include <iostream>):

for(int i = 0; i < 5; ++i){
    std::cout << "did you know?" << std::endl; 
    std::cout << "; signifies **end of statement**" << std::endl;
    std::cout << "**not the end of the line**" << std::endl;
}

It could also be written

for(int i = 0; i < 5; ++i){std::cout << "did you know?" << std::endl; std::cout << "; signifies **end of statement**" << std::endl; std::cout << "**not the end of the line**" << std::endl;}
user2976089
  • 347
  • 1
  • 5
  • 14
1

Some programming languages use it to signify the end of a statement thus making the language oblivious to white-space from a statement standpoint. One thing to bear in mid is that if at compile time you are checking for either a new line or a semicolon and then you have to asses several different "situations" the compiler might get what you wanted to do wrong, and it would take a it longer to look for those situations rather than simply looking for a semicolon at the end of the statement. Some higher level languages try to reduce semicolon use or remove it altogether in order to save a few keystrokes, this languages are more oriented toward the comfort of the programmer and generally come with all sort of syntactic sugar; one could argue that not using semicolons is a kind of syntactic sugar. The use or not of a semicolon in a language should be in according to what the language is trying to accomplish, Languages like C and C++ are mostly about performance, Java and C# are a bit higher in the abstraction sense than C and C++ and then we have things like Scala, Python and Ruby, which are made mostly to make programming more comfortable a the cost of performance,(Ruby openly admits this, and it's very pronounced on Python). So why do some languages "need" semicolons?

  • Makes compiling easier
  • The designer of the language thinks it's more consistent
  • Historical reasons (Java, C# and C++ are also C's children for example)

and one last thing is that Javascript actually adds the semicolons during compile or before IIRC, so it's not actually semicolon free.

Immac
  • 466
  • 1
  • 5
  • 16
1

Short answer:

Because everyone else does it.

Not, nor everyone. Furthermore, many popular languages like Python, Ruby, or Visual Basic, don't use semicolon as end of statement but line breaks. Many, not "everyone", still uses semicolon because historical reasons, not rational argumentation: semicolons had a important role to replace the punched-card format in first age of computation, but today it can be totally discarded.

In fact, there're two popular ways of specify an end of statement:

  1. Using a semicolon.
  2. Leaving as is. This makes the compiler read a line break as end of statement. When you want extend your statement to more of one line, you simply use a special character (like \ in Python) to say that the statement has not finished.

In order to make a code more readable, using a special character to specify an end of statement should be an exception, not the rule.

David Ragazzi
  • 300
  • 2
  • 6
  • 4
    That sounds like a very opinion-based answer to me. – Domino Jul 12 '16 at 20:37
  • It seems very easy to criticize an answer, but did not explain why, dude. Indeed, the only opinion was my last sentence ("In order to make a code more readable, using a special character to specify an end of statement should be an exception, not the rule.") But I wonder why the whole comment is "very opinion-based answer" to you.. – David Ragazzi Sep 02 '16 at 13:40
  • 1
    Perhaps my comment was a bit strong. Your answer is quite informative, but while you point out there is no rational reason for a language to favor semicolon over line-break instruction ends, you don't point out any reason to prefer the opposite other than "readability". To me, semicolons looks closer to English punctuation than having each sentence on its own line. In any case, the worst idea is to have the interpreter guess where instructions end, like with JavaScript, which sometimes guesses wrong. – Domino Sep 02 '16 at 16:14
0

Short answer:

Because everyone else does it.

In theory a language's statement is whatever the language designer is able to syntactically interpret when they parse your file. So if the language designer did not want to have semicolons they could have periods, dashes, spaces, newlines, or whatever to denote the separation of a statement.

Language designers often make the syntax easy to understand so that it can become popular.

Wikipedia: Semicolon Usage in Computer Languages

So if some language designer created a language that used ':-)' to denote the end of a statement it would, 1) be hard to read; 2) not be popular with people who already are used to using a ';'.

echo "Take Care" :-)

Yzmir Ramirez
  • 1,281
  • 7
  • 11
  • I see how it'd be pretty to a typical programmer, but I feel like that's only because they're used to this design. Why would the very first programming language to use semicolons, use it? Is it easier for the compiler? – BlueThen Jan 15 '11 at 18:52
  • 1
    Partially that, but JavaScript has optional semis which are nice because you CAN separate multiple statements one line if you want to. But it also produces some gotchas where not using one can lead to confusion with popular JS patterns like using parens around functions to evaluate and fire them immediately after definition. I actually enjoy white-space end-of-statements but it's never bothered me in JS where I tend to write after every line just to be explicit and avoid confusion for the next dev. It's an extra character here and there but we format in whatever fashion we like. That fits JS. – Erik Reppen Jan 08 '13 at 00:24
  • It's not "easier" to the compiler, it's easier for the programmer. I imagine we agree that it's necessary to be able to tell where a statement ends. If that is always "at the end of a line" then you're limited in layout, so it's useful to also have a separator that works within a line. Now we can have more than one statement per line, but we can't have more than one line per statement, again an annoying and unnecessary restriction. We can handle this by adding a statement-continuation character; now we've got a mess of three things to worry about. Saying "use a semicolon" is easy. –  Apr 26 '19 at 22:38