The C# Language Specification says:
Conceptually speaking, a program is compiled using three steps:
- Transformation, which converts a file from a particular character repertoire and encoding scheme into a sequence of Unicode characters.
- Lexical analysis, which translates a stream of Unicode input characters into a stream of tokens.
- Syntactic analysis, which translates the stream of tokens into executable code.
...
This specification presents the syntax of the C# programming language
using two grammars. The lexical grammar (§2.2.2) defines how Unicode
characters are combined to form line terminators, white space,
comments, tokens, and pre-processing directives. The syntactic grammar
(§2.2.3) defines how the tokens resulting from the lexical grammar are
combined to form C# programs.
We can see that tokens are combined to form the program so whatever tokens are left after previous transformations are what ends up being compiled. Your question is in regards to lexical analysis, specifically how comments, white space, and new lines affect what tokens are generated. The answer is that they don't affect them at all aside from being able to separate tokens:
Five basic elements make up the lexical structure of a C# source file:
Line terminators (§2.3.1), white space (§2.3.3), comments (§2.3.2),
tokens (§2.4), and pre-processing directives (§2.5). Of these basic
elements, only tokens are significant in the syntactic grammar of a C#
program (§2.2.3).
The lexical processing of a C# source file consists
of reducing the file into a sequence of tokens which becomes the input
to the syntactic analysis. Line terminators, white space, and comments
can serve to separate tokens, and pre-processing directives can cause
sections of the source file to be skipped, but otherwise these lexical
elements have no impact on the syntactic structure of a C# program.
So your program can separate tokens by new line characters, white space characters, or comments, and it will compile the same as if they weren't there. Here are two examples I compiled separately and show the Intermediate Language output using ILSpy:
static void Main(string[] args)
{
if
(true
)
/* comment separating `)` token from the `Console` token */
Console.WriteLine("something") /* another comment, semicolon token to the right */;
else // bunch of white space to the left
Console.
WriteLine("something else")
;
}
ILSpy output for the Main() method:
.method private hidebysig static
void Main (
string[] args
) cil managed
{
// Method begins at RVA 0x2088
// Code size 17 (0x11)
.maxstack 1
.entrypoint
.locals init (
[0] bool
)
IL_0000: nop
IL_0001: ldc.i4.1
IL_0002: stloc.0
IL_0003: ldstr "something"
IL_0008: call void [mscorlib]System.Console::WriteLine(string)
IL_000d: nop
IL_000e: br.s IL_0010
IL_0010: ret
} // end of method Program::Main
And the cleaner one showing identical ILSpy output:
static void Main(string[] args)
{
if (true) Console.WriteLine("something"); else Console.WriteLine("something else");
}
ILSpy output for second version:
.method private hidebysig static
void Main(
string[] args
) cil managed
{
// Method begins at RVA 0x2088
// Code size 17 (0x11)
.maxstack 1
.entrypoint
.locals init (
[0] bool
)
IL_0000: nop
IL_0001: ldc.i4.1
IL_0002: stloc.0
IL_0003: ldstr "something"
IL_0008: call void[mscorlib]
System.Console::WriteLine(string)
IL_000d: nop
IL_000e: br.s IL_0010
IL_0010: ret
} // end of method Program::Main