4

Reading All possible C# array initialization syntaxes I wondered why C# always infers an array of int/Int32 where a smaller data type such as byte or short would suffice.

new[] { 30, 130, 230 } // sbyte[] suffices but becomes int[]
new[] { -1, 125, -119 } // sbyte[] suffices  but becomes int[]
new[] { -31647, -1337, 23456} // short suffices but becomes int[]

In the referenced question, Eric Lippert states that the 'best type' is used - see below, but how is int the best possible type? If we are going for overkill, why not use long then?

The type of the array element is inferred by computing the best type, if there is one, of all the given elements that have types. All the elements must be implicitly convertible to that type.

I would suspect that processing 8 or 16-bit datatypes could be faster than 32-bit structures, e.g. when using SIMD where four byte instances could fit in the register space of one int/Int32. I know that SSE instructions are not (widely) used by the JIT Compiler, but this usage of 'int everywhere' ensures that it will not help much when the JIT Compiler is going to include such optimizations.

Could someone elaborate on these facts and tell why it always resorts to int?

// Edit // I don't really care about the specification that prescribes that a literal without a prefix should be considered an int. To rephrase the question:

Why are datatypes used that are larger than needed? Why does the specification have this rule for literals? What are the advantages since the huge downside is the away from future (SIMD) optimizations.

Community
  • 1
  • 1
M. Mimpen
  • 1,212
  • 16
  • 21
  • 2
    Java is not relevant here I guess. – Rohit Jain Feb 28 '14 at 13:27
  • I think in a 32 bit machine, a 32 bit structure is faster than another type of structure. – Rafa Feb 28 '14 at 13:29
  • _If we are going for overkill_ => we're not, manipulating int32 is generally faster than manipulating int8/int16 – ken2k Feb 28 '14 at 13:30
  • @RohitJain - Thanks for the clarification, I have edited my post to remove the reference to Java - and not only the keyword :) – M. Mimpen Feb 28 '14 at 13:37
  • What downside? What is preventing you from explicitly specifying the type when declaring the array? Quite honestly I don't understand your question. – InBetween Feb 28 '14 at 20:31

5 Answers5

6

Why are datatypes used that are larger than needed?

The number of line-of-business applications where you're doing a calculation in integers and can guarantee that the result will fit into a byte or short are vanishingly small. The number of line-of-business applications where the result of an integer calculation fits into an int is enormous.

Why does the specification have this rule for literals?

Because it is a perfectly sensible rule. It is consistent, clear and understandable. It makes a good compromise between many language goals such as reasonable performance, interoperability with existing unmanaged code, familiarity to users of other languages, and treating numbers as numbers rather than as bit patterns. The vast majority of C# programs use numbers as numbers.

What are the advantages since the huge downside is the away from future (SIMD) optimizations.

I assure you that not one C# programmer in a thousand would list "difficulty of taking advantage of SIMD optimizations" as a "huge downside" of C#'s array type inference semantics. You may in fact be the only one. It certainly would not have occurred to me. If you're the kind of person who cares so much about it then make the type manifest in the array initializer.

C# was not designed to wring every last ounce of performance out of machines that might be invented in the future, and particularly was not designed to do so when type inference is involved. It was designed to increase productivity of line-of-business developers, and line-of-business developers don't think of columnWidths = new [] { 10, 20, 30 }; as being an array of bytes.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
4

C# 5.0 spec 2.4.4.2

• If the literal has no suffix, it has the first of these types in which its value can be represented: int, uint, long, ulong.

• If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.

• If the literal is suffixed by L or l, it has the first of these types in which its value can be represented: long, ulong.

• If the literal is suffixed by UL, Ul, uL, ul, LU, Lu, lU, or lu, it is of type ulong.

All of your examples hit the first in that list... int.

All integral literals follow this rule. Which is why var i = 10; is inferred as int too.

Community
  • 1
  • 1
Simon Whitehead
  • 63,300
  • 9
  • 114
  • 138
  • Thanks for the quick answer, but since I'm not looking how literals are handled, I clarified the question. – M. Mimpen Feb 28 '14 at 13:42
  • 1
    Given your edit.. is your question why does the specification have this rule for literals? – Simon Whitehead Feb 28 '14 at 13:44
  • Yes, fully correct. It seems I find it hard to get the right question. – M. Mimpen Feb 28 '14 at 13:45
  • 1
    Someone with more insight will have to answer that. However, I have a feeling that its because arithmetic operations in the CLR aren't actually performed on anything smaller than a 32-bit `int`. Anything smaller than that is promoted so that the operation sits on the word size boundary of the platform. By inferring 32-bit as the default.. you avoid the (no doubt.. fairly small) promotion cost. – Simon Whitehead Feb 28 '14 at 13:47
1

The literals you use as examples all have have System.Int32, while the values could be stored without loss in narrowed integral types (eg. System.Int16) the syntax says System.Int32.

As all the specified members of each array are System.Int32, the array has type System.Int32[].

Of course it would be possible to define a language where integral literals (without other indication such as suffixes) have type "the smallest integral type sufficient to hold the value" that language is not C#.

In the latest – V5.0 – C# Language specification (from my VS2013 installation), in section 2.4.4.2:

Integer literals are used to write values of types int, uint, long, and ulong.

Ie. there is no way to write a byte, sbyte, short, or unsigned short literal without a cast.

Richard
  • 106,783
  • 21
  • 203
  • 265
  • Same as @Simon, the specification is one thing, but why has the language not been made as you finely describe "the smallest integral type sufficient to hold the value"? – M. Mimpen Feb 28 '14 at 13:43
  • @M.Mimpen Because that is what the designers decided. You'll need to ask them, unless, somewhere, they've written a rationale (but I don't think they have). – Richard Mar 01 '14 at 10:27
1

When you put integer value without any suffix like 30, 130, 230 you declare int32 value; so

new[] { 30, 130, 230 }; // <- array of int's

and if you want array of byts you have to put it explicitly:

  new byte[] { 30, 130, 230 }; // <- treat each value as byte
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
-2

I believe that operations will always be faster running in the native bit-size, so int for 32-bit machines, hence the convention.

This also implies that for running 64-bit applications, int64 would be better used than int for arrays.

Starscream1984
  • 3,072
  • 1
  • 19
  • 27
  • Sounds reasonable, but do you have any proof that `int64` is used for `int32` operations on `64-bit` machines? – M. Mimpen Feb 28 '14 at 13:36
  • Ah no, I meant that it would be better for your program if you explicitly declared your numbers as int64 - in C# the int always means int32 – Starscream1984 Feb 28 '14 at 13:39
  • 1
    Sriram - I think the standard bitness of machines at the time is probably relevant in why the designers of c# originally chose int32 as their 'default' int bit-ness. – Starscream1984 Feb 28 '14 at 13:42