14

I assume that an internal casting happens when we write: arr[i] (which is equivalent to *(arr+i)). Because i can for example be a short, int or long or the unsigned variant of any of these three.

So my question is simple: which type should i be so that no internal conversion takes place? So that the code can run most efficiently?

Crude guess: size_t?

Mysticial
  • 464,885
  • 45
  • 335
  • 332
makhlaghi
  • 3,856
  • 6
  • 27
  • 34
  • I'd be more interested if you setup test cases for `int8_t` up to and including [`ptrdiff_t`](http://en.cppreference.com/w/cpp/types/ptrdiff_t) (my nominee), to see what the release-level asm actually looks like and just what the differences are (or aren't). – WhozCraig Feb 06 '14 at 07:16
  • FWIW, I can confirm that using the "wrong" index size does lead to sign/zero-extension instructions on array accesses. But at the same time, the compiler is pretty good at optimizing them out via [common subexpression elimination](http://en.wikipedia.org/wiki/Common_subexpression_elimination). – Mysticial Feb 06 '14 at 07:16
  • https://www.google.com/#q=c+right+type+array+index – user2485710 Feb 06 '14 at 07:19
  • @WhozCraig: You mean that it is different from system to system? – makhlaghi Feb 06 '14 at 07:19
  • @makhlaghi not sure I follow your comment. I mean if you're looking for optimal sooner or later you're going to need to see the instructions, out of curiosity if nothing else. I mean seeing the actual generated code difference when using identical algorithms (an enumeration of an array, for example) but with different index numeric types. – WhozCraig Feb 06 '14 at 07:23
  • additionally http://stackoverflow.com/questions/1464174/size-t-vs-intptr-t – user2485710 Feb 06 '14 at 07:24
  • The native array index type is `int` IIRC. – StackedCrooked Feb 06 '14 at 07:24
  • 1
    @StackedCrooked Not really. On my machine, `int` is 32 bits. But I can do 64-bit indexing. – Mysticial Feb 06 '14 at 07:25
  • @Mysticial: Thanks, I understand that the compiler might indeed optimize it. What type will the compiler use? Or How will the compiler optimze it? I know it depends on the compiler, take GCC for example. – makhlaghi Feb 06 '14 at 07:25
  • 2
    @makhlaghi So the compiler will first compile it to do an extension on every single array access. But then it will see that the extension is done to the same variable over and over again. So it will do it once and reuse the extended result. (aka Common Subexpression Elimination) – Mysticial Feb 06 '14 at 07:27
  • I want to see the use case where such concerns actually matter :P – thecoshman Feb 06 '14 at 07:31
  • @thecoshman: I am working on image arrays and until was using size_t for the index. Then I became curious that there might be a better type, so I thought of sharing this curiosity to see what others think. In large arrays (images) it should make a difference, won't it? – makhlaghi Feb 06 '14 at 07:33
  • ahh... so this is premature optimisation. Before mucking around with this nonsense, first make sure that your code is indeed actually slow. Write code that is easy to maintain first and foremost. Only when it is proven, with solid testing, to be 'too slow' should you consider making it harder to read/maintain for the sake of a few cycles here and there. Yes these cycles can add up, but maintenance of code is probably the hardest thing in programming. – thecoshman Feb 06 '14 at 07:47
  • The above comment is simplistic, especially for this application. See http://c2.com/cgi/wiki?PrematureOptimization – Jim Balter Feb 06 '14 at 07:58
  • 1
    The fastest way to access an array is not to use the subscript operator at all. For example `for (int i = 0; i < n; i++) f(array[i])` should better be written as `for (array_type* p = array; p < array+n; p++) f(*p)` Like this, you only need to add an address and an integer once. Another point: When it comes to performance, don't argue, measure! If you do what user2485710 proposed, you can benchmark your code once with e.g. `size_t` and once with `ptrdiff_t` and see if it makes a difference at all. Only if it does, you have to think about which to use. – gTcV Feb 06 '14 at 07:59
  • @user1734710: I do use that kind of array parsing when I am for example copying all the elements of one array (image) to another. But In some (rare) cases I need indices. – makhlaghi Feb 06 '14 at 08:55
  • @thecoshman: I don't understand how setting the type of a variable can be considered premature optimization. As user2485710 showed, I can define a type and change it when ever I like! What effect can this have on the readability or maintenance of the code? – makhlaghi Feb 06 '14 at 08:56
  • You can name it what ever you like ;-)! The way I see it, it is not fixing, it is correct usage. – makhlaghi Feb 07 '14 at 01:55
  • @thecoshman: Don't forget that it is only a temporary replacement because there appears to be no consensus on what is the correct type to use! Which is frankly very surprising for me! – makhlaghi Feb 07 '14 at 03:00
  • You might also be surprised to here there is no consensus on many things that don't really matter. – thecoshman Feb 07 '14 at 09:21

2 Answers2

3

It's unlikely to make any significant difference to performance. In any case, you should always be using the type that's semantically correct, not trying to make premature optimizations for things that aren't going to matter. In the case of indices, size_t is the smallest type that's a priori correct, but you may be able to get away with smaller types if you know the array you're working with is bounded. Usually though you should just use size_t to be safe.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Strictly speaking, `ptrdiff_t` is for indexing, not `size_t`. So many integer types... – user2357112 Feb 06 '14 at 07:34
  • Thank you, I was using `size_t` until now. But if it is not the correct type, implicit casting will take place any way, right? If we can avoid it, and it fits with the semantics of the code, why shouldn't we? – makhlaghi Feb 06 '14 at 07:36
  • 1
    Except on fake-32-on-64 targets like MIPS n32 or "x32" on x86_64, `size_t` is going to be the native register size anyway and is going to be at least as efficient as any other type for use as an index. BTW there's no such thing as an "implicit cast". Casts are by definition explicit operators for performing a conversion. – R.. GitHub STOP HELPING ICE Feb 06 '14 at 07:40
  • 2
    @user2357112 `ptrdiff_t` is signed and should be avoided. – Jim Balter Feb 06 '14 at 08:04
  • @JimBalter: Personally, I have used `size_t` before, but just for curiosity: why should signed be avoided? Because of performance or the fact that the index of an array can never be negative? – makhlaghi Feb 06 '14 at 09:02
  • 1
    @makhlaghi It should be avoided because it risks having a large positive value treated as a negative one. – Jim Balter Feb 06 '14 at 12:35
2

My answer is to keep this simple, maybe you want to typedef a type for you index like so

typedef size_t TPtrIdx;

and this way you can easily translate the type into whatever you want it to be, now or 1 year later.

Regarding the potential problems between conversions from/to signed and unsigned types, that's something that a good compiler can handle for you, for example gcc offers the -Wconversion flag, and is always a good thing to enable it.

user2485710
  • 9,451
  • 13
  • 58
  • 102