2

I am writing my own programming language and am reconsidering many aspects of my syntax right now. Something that bothers me in many most languages is the trailing apostrophe in character literals.

Example
With trailing slash: 'n'
Without trailing slash: 'n

Why do new languages (like rust f.e.) keep using a trailing apostrophe?
Seeing such languages fixing issues we had with old languages (ok, the trailing apostophe is not really an issue) leaves me thinking that there must be a benefit to keeping the trailing apostrophe. I'd agree to keep it if it supports readability, but I don't think it does.
Here are some more exotic examples:

  • '\n   vs   '\n'
  • '\r   vs   '\r'
  • '\t   vs   '\t'
  • '\\   vs   '\\'
  • '\'   vs   '\''
  • '\"   vs   '\"'

Do we keep this syntax due to historical reasons or is there more to it that I don't yet understand?
Note that the trailing quotation mark in a string literal is necessary.

Noel Widmer
  • 4,444
  • 9
  • 45
  • 69

3 Answers3

4

Interesting idea, but it appears one cannot represent the space character with this.

OmegaNerd
  • 96
  • 3
  • **Your input is very good**. That is actually the reason why I have decided not to follow this concept anymore. – Noel Widmer Oct 24 '17 at 14:26
3

Here's what I think:

  1. It's simply more intuitive to see two matching quotes (or apostrophes) instead of just one.
  2. Two matching quotes are probably a little bit easier for lexing and parsing, especially in small tools (like editor plugins, highlighters), and web apps.
  3. Some literals like '\' might look quite confusing – many people might think this represents a \ (backslash).
  4. Some highlighters for similar languages (e.g. C/C++/C# if it's a C-like language) use the same rules for character literals as they do for string literals, so they might highlight everything after the first quote until another character literal is found (possibly highlighting multiple lines). This could be a problem with new languages that don't have their own highlighters for popular code editors.

Example: (the same as image; another similar issue)

// Normally looks like this:
var a = '\n'; // comment
var s2 = 'r'.Repeat(5).Replace('r', 'R'); // comment

// Only one 'single quote' with some highlighters:
var a = '\n; // comment
var b = '\'; // comment
var s1 = 'a + 'b + 'c  // comment
Ghost4Man
  • 1,040
  • 1
  • 12
  • 19
  • Don't post pictures of text here. Post the text. Waste of your time and our bandwidth. – user207421 Aug 03 '17 at 21:44
  • @EJP Okay, edited, but SO highlighter works in a slightly different way than some others, so the issue is not completely visible. – Ghost4Man Aug 03 '17 at 21:54
2

Because

  1. You don't want an error in a character literal to change the meaning of the entire rest of the program. There are digraphs and trigraphs for example, so quite a large space of possible character literals and quite a lot of ways to specify an invalid one. You don't want the part after the invalid character to become part of the rest of the program for scanning or parsing purposes. You want to know where to stop.

  2. You don't really want the legal character literal rules to be part of the lexical specification of the language, which has to be extremely stable. You want to be able to add a new digraph, trigraph, etc., without having to respecify the lexical rules of the language, and without changing the meaning of existing programs.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Did you miss a word in the first sentence of your second paragraph? Also, what language supports digraphs as character literals? Never seen it and would like to know how it looks. – Noel Widmer Aug 04 '17 at 05:20
  • 1
    @NoelWidmer (1) Yes, see edit; (2) C, C++, Java, ... `\r` is a digraph, and Java's `\u0000` is a heptagraph. – user207421 Aug 05 '17 at 09:42