39

The TL;DR version, for those who don't want the background, is the following specific question:

Question

Why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?

Background

Java has multidimensional arrays at the syntax level, in that one can declare

int[][] arr = new int[10][10];

but it seems that this is really not what one might have expected. Rather than having the JVM allocate a contiguous block of RAM big enough to store 100 ints, it comes out as an array of arrays of ints: so each layer is a contiguous block of RAM, but the thing as a whole is not. Accessing arr[i][j] is thus rather slow: the JVM has to

  1. find the int[] stored at arr[i];
  2. index this to find the int stored at arr[i][j].

This involves querying an object to go from one layer to the next, which is rather expensive.

Why Java does this

At one level, it's not hard to see why this can't be optimised to a simple scale-and-add lookup even if it were all allocated in one fixed block. The problem is that arr[3] is a reference all of its own, and it can be changed. So although arrays are of fixed size, we could easily write

arr[3] = new int[11];

and now the scale-and-add is screwed because this layer has grown. You'd need to know at runtime whether everything is still the same size as it used to be. In addition, of course, this will then get allocated somewhere else in RAM (it'll have to be, since it's bigger than what it's replacing), so it's not even in the right place for scale-and-add.

What's problematic about it

It seems to me that this is not ideal, and that for two reasons.

For one, it's slow. A test I ran with these methods for summing the contents of a single dimensional or multidimensional array took nearly twice as long (714 seconds vs 371 seconds) for the multidimensional case (an int[1000000] and an int[100][100][100] respectively, filled with random int values, run 1000000 times with warm cache).

public static long sumSingle(int[] arr) {
    long total = 0;
    for (int i=0; i<arr.length; i++)
        total+=arr[i];
    return total;
}

public static long sumMulti(int[][][] arr) {
    long total = 0;
    for (int i=0; i<arr.length; i++)
        for (int j=0; j<arr[0].length; j++)
            for (int k=0; k<arr[0][0].length; k++)
                total+=arr[i][j][k];
    return total;
}   

Secondly, because it's slow, it thereby encourages obscure coding. If you encounter something performance-critical that would be naturally done with a multidimensional array, you have an incentive to write it as a flat array, even if that makes the unnatural and hard to read. You're left with an unpalatable choice: obscure code or slow code.

What could be done about it

It seems to me that the basic problem could easily enough be fixed. The only reason, as we saw earlier, that it can't be optimised is that the structure might change. But Java already has a mechanism for making references unchangeable: declare them as final.

Now, just declaring it with

final int[][] arr = new int[10][10];

isn't good enough because it's only arr that is final here: arr[3] still isn't, and could be changed, so the structure might still change. But if we had a way of declaring things so that it was final throughout, except at the bottom layer where the int values are stored, then we'd have an entire immutable structure, and it could all be allocated as one block, and indexed with scale-and-add.

How it would look syntactically, I'm not sure (I'm not a language designer). Maybe

final int[final][] arr = new int[10][10];

although admittedly that looks a bit weird. This would mean: final at the top layer; final at the next layer; not final at the bottom layer (else the int values themselves would be immutable).

Finality throughout would enable the JIT compiler to optimise this to give performance to that of a single dimensional array, which would then take away the temptation to code that way just to get round the slowness of multidimensional arrays.

(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)

Question

So why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?

Update

A bizarre side note: the difference in timings drops away to only a few percent if you use an int for the running total rather than a long. Why would there be such a small difference with an int, and such a big difference with a long?

Benchmarking code

Code I used for benchmarking, in case anyone wants to try to reproduce these results:

public class Multidimensional {

    public static long sumSingle(final int[] arr) {
        long total = 0;
        for (int i=0; i<arr.length; i++)
            total+=arr[i];
        return total;
    }

    public static long sumMulti(final int[][][] arr) {
        long total = 0;
        for (int i=0; i<arr.length; i++)
            for (int j=0; j<arr[0].length; j++)
                for (int k=0; k<arr[0][0].length; k++)
                    total+=arr[i][j][k];
        return total;
    }   

    public static void main(String[] args) {
        final int iterations = 1000000;

        Random r = new Random();
        int[] arr = new int[1000000];
        for (int i=0; i<arr.length; i++)
            arr[i]=r.nextInt();
        long total = 0;
        System.out.println(sumSingle(arr));
        long time = System.nanoTime();
        for (int i=0; i<iterations; i++)
            total = sumSingle(arr);
        time = System.nanoTime()-time;
        System.out.printf("Took %d ms for single dimension\n", time/1000000, total);

        int[][][] arrMulti = new int[100][100][100];
        for (int i=0; i<arrMulti.length; i++)
            for (int j=0; j<arrMulti[i].length; j++)
                for (int k=0; k<arrMulti[i][j].length; k++)
                    arrMulti[i][j][k]=r.nextInt();
        System.out.println(sumMulti(arrMulti));
        time = System.nanoTime();
        for (int i=0; i<iterations; i++)
            total = sumMulti(arrMulti);
        time = System.nanoTime()-time;
        System.out.printf("Took %d ms for multi dimension\n", time/1000000, total);
    }

}
chiastic-security
  • 20,430
  • 4
  • 39
  • 67
  • 2
    Java arrays are objects, but more importantly what problem does changing the internal representation of arrays help you solve? – Elliott Frisch Oct 11 '14 at 19:20
  • 2
    The only way to get an authoritative answer on this is to google it out. I don't think any SO member would be able to provide a first-hand answer. – Marko Topolnik Oct 11 '14 at 19:22
  • You could also do index math instead of having nested arrays. That's not necessarily better though, YMMV. – harold Oct 11 '14 at 19:22
  • @ElliottFrisch he explained early on in the post , having the array allocated in a linear block of ram, is faster(when accessing) than allocating each array alone. – vlatkozelka Oct 11 '14 at 19:27
  • @chiastic-security how is this a question to stackoverflow members ? although i agree with you on it , isn't this suposed to go on Oracle forums or something ? – vlatkozelka Oct 11 '14 at 19:29
  • 1
    @vlatkozelka Yes, but Java doesn't support or allow explicit memory management. The fact that arrays aren't primitives was a design decision. Knowing both of these things, doesn't really help me write a program or solve a problem. – Elliott Frisch Oct 11 '14 at 19:30
  • 8
    @vlatkozelka If I come across something that looks to me like it's a defect of a well established language that's nearly 20 years old, my first assumption is that I've missed something... and there are some very clever people around here that might be able to tell me what I've missed... – chiastic-security Oct 11 '14 at 19:37
  • @chiastic-security Apparently C#'s multidimensional arrays [take twice as long to iterate](http://stackoverflow.com/questions/5313832/multidimensional-arrays-in-java-and-c-sharp) as single dimensional ones too, so you won't necessarily gain anything by adding support in the language. I imagine Java doesn't have N-dimensional arrays because C++ didn't, while .NET has them because VB did. – that other guy Oct 11 '14 at 20:10
  • The language specification doesn't say much more than *arrays can be arrays or arrays*. There are a number of packages for working with matrices in java, so my guess is that most people who need contiguous memory already use them, and the topic hasn't been brought up since. One of the other issues with arrays is that they still can't be accessed using `long`. There was discussion of this changing at one point. – Jason Oct 11 '14 at 20:12
  • possible duplicate of [Fastest way to read/store lots of multidimensional data? (Java)](http://stackoverflow.com/questions/2966965/fastest-way-to-read-store-lots-of-multidimensional-data-java) – Bartosz Bilicki Oct 11 '14 at 21:17
  • 22
    This question **is off-topic** because it is **ranty blog post** disguised as a question! –  Oct 11 '14 at 21:18
  • https://stackoverflow.com/questions/2966965/fastest-way-to-read-store-lots-of-multidimensional-data-java covers possible solutions. – Bartosz Bilicki Oct 11 '14 at 21:18
  • 3
    Java doesn't because it's not designed that way. Why doesn't C have objects? Why doesn't Pascal have an operator for matrix inverse? – Hot Licks Oct 11 '14 at 21:21
  • 1
    This should be of interest: http://cr.openjdk.java.net/~jrose/pres/201207-Arrays-2.pdf – Marko Topolnik Oct 11 '14 at 21:21
  • 2
    @BartoszBilicki It does, but this question is about "why", i.e., about design decisions leading to people having to ask the question you linked. – maaartinus Oct 11 '14 at 21:24
  • 11
    @JarrodRoberson Where did you think I lost my temper and started ranting? The tone is measured throughout. And the post leads up to a clear technical question at the end, asking whether there's a reason that this feature hasn't been included. – chiastic-security Oct 11 '14 at 21:25
  • 9
    @JarrodRoberson Some people simply ask about how things work or why they work the way they do. If this is a rant, then it's **perfectly disguised** as I can't see here anything like this. – maaartinus Oct 11 '14 at 21:38
  • @chiastic-security: It's a good question, but it doesn't really fit the mould of an SO question. I flagged it for migration to programmers, where I think it might meet a warmer reception. – tmyklebu Oct 11 '14 at 22:09
  • 3
    The entire "what could be fixed" section seems pointless. In fact, this whole question is much larger than necessary. The core question is "Is there a technical reason Java does not support true multi-dimensional arrays, which are more performant than Java's form of multi-dimensional arrays". Everything else is tons of fluff that makes it very difficult to get to the actual question being asked. – Chris Hayes Oct 12 '14 at 06:41
  • 2
    @AndrewBarber I don't understand how this could be classed as too broad. It asks a question about a very specific missing feature from one particular language. I don't see what's broad about it. It has already attracted some good and quite specific answers. It also has 24 upvotes from people who have appreciated the question. Can you explain? – chiastic-security Oct 12 '14 at 18:13
  • The comments above explain pretty well. Especially the one above. Even the answers illustrate the broadness of the question. – Andrew Barber Oct 12 '14 at 18:20
  • @AndrewBarber Regarding that comment, I put lots of background in there because it seems useful to have all the background in the same place when one looks at the question; but as that comment suggests, the core question is pretty clear. Would it help if I had a clear pointer to it at the top for those who know the background? – chiastic-security Oct 12 '14 at 18:25
  • 1
    You state many very prominent questions in your post. You even gave them headers. The "core" question is buried in the middle with no real suggestion that it's much more important than the other questions. That said; that question isn't on topic here, while some of the other ones could be. – Andrew Barber Oct 12 '14 at 18:30
  • @AndrewBarber OK, I'd hoped it was clear those headings were rhetorical questions and part of the background. I've changed them to statements rather than questions. The core question is now right at the top. – chiastic-security Oct 12 '14 at 18:36
  • I put a finger exercise in micro benchmarking at [Ideone](http://ideone.com/hNERoh). Getting quite different results, especially compared to execution at Ideone - be sure to set number of iterations & such to reasonable sizes when running in your own environment. My conclusion: _with relatively large bottom dimension_ the differences between _single array&index arithmetic_ vs. _nested arrays_ are negligible. – greybeard Jan 30 '16 at 16:06

6 Answers6

21

but it seems that this is really not what one might have expected.

Why?

Consider that the form T[] means "array of type T", then just as we would expect int[] to mean "array of type int", we would expect int[][] to mean "array of type array of type int", because there's no less reason for having int[] as the T than int.

As such, considering that one can have arrays of any type, it follows just from the way [ and ] are used in declaring and initialising arrays (and for that matter, {, } and ,), that without some sort of special rule banning arrays of arrays, we get this sort of use "for free".

Now consider also that there are things we can do with jagged arrays that we can't do otherwise:

  1. We can have "jagged" arrays where different inner arrays are of different sizes.
  2. We can have null arrays within the outer array where appropriate mapping of the data, or perhaps to allow lazy building.
  3. We can deliberately alias within the array so e.g. lookup[1] is the same array as lookup[5]. (This can allow for massive savings with some data-sets, e.g. many Unicode properties can be mapped for the full set of 1,112,064 code points in a small amount of memory because leaf arrays of properties can be repeated for ranges with matching patterns).
  4. Some heap implementations can handle the many smaller objects better than one large object in memory.

There are certainly cases where these sort of multi-dimensional arrays are useful.

Now, the default state of any feature is unspecified and unimplemented. Someone needs to decide to specify and implement a feature, or else it wouldn't exist.

Since, as shown above, the array-of-array sort of multidimensional array will exist unless someone decided to introduce a special banning array-of-array feature. Since arrays of arrays are useful for the reasons above, that would be a strange decision to make.

Conversely, the sort of multidimensional array where an array has a defined rank that can be greater than 1 and so be used with a set of indices rather than a single index, does not follow naturally from what is already defined. Someone would need to:

  1. Decide on the specification for the declaration, initialisation and use would work.
  2. Document it.
  3. Write the actual code to do this.
  4. Test the code to do this.
  5. Handle the bugs, edge-cases, reports of bugs that aren't actually bugs, backward-compatibility issues caused by fixing the bugs.

Also users would have to learn this new feature.

So, it has to be worth it. Some things that would make it worth it would be:

  1. If there was no way of doing the same thing.
  2. If the way of doing the same thing was strange or not well-known.
  3. People would expect it from similar contexts.
  4. Users can't provide similar functionality themselves.

In this case though:

  1. But there is.
  2. Using strides within arrays was already known to C and C++ programmers and Java built on its syntax so that the same techniques are directly applicable
  3. Java's syntax was based on C++, and C++ similarly only has direct support for multidimensional arrays as arrays-of-arrays. (Except when statically allocated, but that's not something that would have an analogy in Java where arrays are objects).
  4. One can easily write a class that wraps an array and details of stride-sizes and allows access via a set of indices.

Really, the question is not "why doesn't Java have true multidimensional arrays"? But "Why should it?"

Of course, the points you made in favour of multidimensional arrays are valid, and some languages do have them for that reason, but the burden is nonetheless to argue a feature in, not argue it out.

(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)

Like many rumours, there's an element of truth here, but it is not the full truth.

.NET arrays can indeed have multiple ranks. This is not the only way in which it is more flexible than Java. Each rank can also have a lower-bound other than zero. As such, you could for example have an array that goes from -3 to 42 or a two dimensional array where one rank goes from -2 to 5 and another from 57 to 100, or whatever.

C# does not give complete access to all of this from its built-in syntax (you need to call Array.CreateInstance() for lower bounds other than zero), but it does for allow you to use the syntax int[,] for a two-dimensional array of int, int[,,] for a three-dimensional array, and so on.

Now, the extra work involved in dealing with lower bounds other than zero adds a performance burden, and yet these cases are relatively uncommon. For that reason single-rank arrays with a lower-bound of 0 are treated as a special case with a more performant implementation. Indeed, they are internally a different sort of structure.

In .NET multi-dimensional arrays with lower bounds of zero are treated as multi-dimensional arrays whose lower bounds just happen to be zero (that is, as an example of the slower case) rather than the faster case being able to handle ranks greater than 1.

Of course, .NET could have had a fast-path case for zero-based multi-dimensional arrays, but then all the reasons for Java not having them apply and the fact that there's already one special case, and special cases suck, and then there would be two special cases and they would suck more. (As it is, one can have some issues with trying to assign a value of one type to a variable of the other type).

Not a single thing above shows clearly that Java couldn't possibly have had the sort of multi-dimensional array you talk of; it would have been a sensible enough decision, but so also the decision that was made was also sensible.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
  • 1
    the argument because we can make an abstraction we don't need a standard one doesn't hold; first makes it the path of higher resistence and likely have many distinct incompatible implementations many of which flawed, second makes it ok to use outside of proofs of concept. – Dmytro Apr 03 '17 at 20:35
  • @Dmitry no, it does hold, because people certainly don't *need* it. If they did then Java's not having it would mean people didn't use Java and clearly they do. One could argue that it remains useful enough that they *should* have it, and I do say above that that would also have been a sensible design decision, but it's not so necessary that the decision to leave it out is indefensible. – Jon Hanna Apr 04 '17 at 09:03
16

This should be a question to James Gosling, I suppose. The initial design of Java was about OOP and simplicity, not about speed.

If you have a better idea of how multidimensional arrays should work, there are several ways of bringing it to life:

  1. Submit a JDK Enhancement Proposal.
  2. Develop a new JSR through Java Community Process.
  3. Propose a new Project.

UPD. Of course, you are not the first to question the problems of Java arrays design.
For instance, projects Sumatra and Panama would also benefit from true multidimensional arrays.

"Arrays 2.0" is John Rose's talk on this subject at JVM Language Summit 2012.

apangin
  • 92,924
  • 10
  • 193
  • 247
  • There is one good question, however: why are arrays-of-arrays any *simpler* than contiguous arrays? Are they easier to reason about? And btw, have you ever needed, or do you know anyone who needed, a jagged array? – Marko Topolnik Oct 11 '14 at 19:57
  • 1
    @MarkoTopolnik - Array of arrays are simpler because then multi-dimension arrays can be ignored, save for some minor details in the compiler. Sorta like C largely ignores plain old arrays. – Hot Licks Oct 11 '14 at 21:24
  • 1
    @HotLicks You compactly expressed for what I needed so many words in my answer. – maaartinus Oct 11 '14 at 21:25
  • 1
    @MarkoTopolnik Arrays-of-arrays come for free since there are arrays of objects in Java and every array is an object, while multidimensional arrays are essentially new entities. They would require additional language and VM support while offering no new functionality. – apangin Oct 11 '14 at 21:25
  • 1
    @apangin - Actually, it wouldn't be terribly hard to support "true" (whatever that means) multi-dimension arrays. Perhaps the biggest problem would notational, to be able to declare a pointer to either flavor (and combinations of the two). But it is additional complexity that was not needed when Java was invented, and it still is of questionably utility -- the efficiency difference would be negligible in most cases. – Hot Licks Oct 11 '14 at 21:28
  • 2
    As John Rose points out, people don't *really* want 2D-arrays, they want their *benefits*, such as cache friendliness. He also explains how those benefits can be had within the current framework. – Marko Topolnik Oct 11 '14 at 21:42
  • 2
    And supporting "cache friendliness" would be a simple matter of adding a new bytecode to create the array. It would still be a collection of arrays, so would be addressed in the "usual" (for Java) way, but the individual arrays would be allocated together. Modest changes would be required of GC, no changes required to the bytecode interpreter or JITC (other than to add the new instruction). And of course javac would have to change to generate the instruction. (And JITCs could likely do this optimization without the new instruction, by recognizing the pattern.) – Hot Licks Oct 11 '14 at 22:15
  • @MarkoTopolnik: For arrays of non-trivial size, there's negligible difference in cache friendliness between a "true" 2D array and a Java 2D array. The Java version needs to maintain an extra set of references in cache, but that's it. – Oliver Charlesworth Oct 12 '14 at 09:42
  • @OliverCharlesworth Not exactly true: Any array with a small inner dimension (row length), regardless how large overall, might suffer. – Marko Topolnik Oct 12 '14 at 09:45
  • @MarkoTopolnik: Yeah, I guess that was careless wording when I wrote "non-trivial". What I really meant was "non-trivial inner dimension" ;) – Oliver Charlesworth Oct 12 '14 at 09:46
10

To me it looks like you sort of answered the question yourself:

... an incentive to write it as a flat array, even if that makes the unnatural and hard to read.

So write it as a flat array which is easy to read. With a trivial helper like

double get(int row, int col) {
    return data[rowLength * row + col];
}

and similar setter and possibly a +=-equivalent, you can pretend you're working with a 2D array. It's really no big deal. You can't use the array notation and everything gets verbose and ugly. But that seems to be the Java way. It's exactly the same as with BigInteger or BigDecimal. You can't use braces for accessing a Map, that's a very similar case.

Now the question is how important all those features are? Would more people be happy if they could write x += BigDecimal.valueOf("123456.654321") + 10;, or spouse["Paul"] = "Mary";, or use 2D arrays without the boilerplate, or what? All of this would be nice and you could go further, e.g., array slices. But there's no real problem. You have to choose between verbosity and inefficiency as in many other cases. IMHO, the effort spent on this feature can be better spent elsewhere. Your 2D arrays are a new best as....

Java actually has no 2D primitive arrays, ...

it's mostly a syntactic sugar, the underlying thing is array of objects.

double[][] a = new double[1][1];
Object[] b = a;

As arrays are reified, the current implementation needs hardly any support. Your implementation would open a can of worms:

  • There are currently 8 primitive types, which means 9 array types, would a 2D array be the tenth? What about 3D?
  • There is a single special object header type for arrays. A 2D array could need another one.
  • What about java.lang.reflect.Array? Clone it for 2D arrays?
  • Many other features would have be adapted, e.g. serialization.

And what would

??? x = {new int[1], new int[2]};

be? An old-style 2D int[][]? What about interoperability?

I guess, it's all doable, but there are simpler and more important things missing from Java. Some people need 2D arrays all the time, but many can hardly remember when they used any array at all.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
9

I am unable to reproduce the performance benefits you claim. Specifically, the test program:

public abstract class Benchmark {

    final String name;

    public Benchmark(String name) {
        this.name = name;
    }

    abstract int run(int iterations) throws Throwable;

    private BigDecimal time() {
        try {
            int nextI = 1;
            int i;
            long duration;
            do {
                i = nextI;
                long start = System.nanoTime();
                run(i);
                duration = System.nanoTime() - start;
                nextI = (i << 1) | 1;
            } while (duration < 1000000000 && nextI > 0);
            return new BigDecimal((duration) * 1000 / i).movePointLeft(3);
        } catch (Throwable e) {
            throw new RuntimeException(e);
        }
    }

    @Override
    public String toString() {
        return name + "\t" + time() + " ns";
    }

    public static void main(String[] args) throws Exception {
        final int[] flat = new int[100*100*100];
        final int[][][] multi = new int[100][100][100];

        Random chaos = new Random();
        for (int i = 0; i < flat.length; i++) {
            flat[i] = chaos.nextInt();
        }
        for (int i=0; i<multi.length; i++)
            for (int j=0; j<multi[0].length; j++)
                for (int k=0; k<multi[0][0].length; k++)
                    multi[i][j][k] = chaos.nextInt();

        Benchmark[] marks = {
            new Benchmark("flat") {
                @Override
                int run(int iterations) throws Throwable {
                    long total = 0;
                    for (int j = 0; j < iterations; j++)
                        for (int i = 0; i < flat.length; i++)
                            total += flat[i];
                    return (int) total;
                }
            },
            new Benchmark("multi") {
                @Override
                int run(int iterations) throws Throwable {
                    long total = 0;
                    for (int iter = 0; iter < iterations; iter++)
                        for (int i=0; i<multi.length; i++)
                            for (int j=0; j<multi[0].length; j++)
                                for (int k=0; k<multi[0][0].length; k++)
                                    total+=multi[i][j][k];
                    return (int) total;
                }
            },
            new Benchmark("multi (idiomatic)") {
                @Override
                int run(int iterations) throws Throwable {
                    long total = 0;
                    for (int iter = 0; iter < iterations; iter++)
                        for (int[][] a : multi)
                            for (int[] b : a)
                                for (int c : b)
                                    total += c;
                    return (int) total;
                }
            }

        };

        for (Benchmark mark : marks) {
            System.out.println(mark);
        }
    }
}

run on my workstation with

java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

prints

flat              264360.217 ns
multi             270303.246 ns
multi (idiomatic) 266607.334 ns

That is, we observe a mere 3% difference between the one-dimensional and the multi-dimensional code you provided. This difference drops to 1% if we use idiomatic Java (specifically, an enhanced for loop) for traversal (probably because bounds checking is performed on the same array object the loop dereferences, enabling the just in time compiler to elide bounds checking more completely).

Performance therefore seems an inadequate justification for increasing the complexity of the language. Specifically, to support true multi dimensional arrays, the Java programming language would have to distinguish between arrays of arrays, and multidimensional arrays. Likewise, programmers would have to distinguish between them, and be aware of their differences. API designers would have to ponder whether to use an array of arrays, or a multidimensional array. The compiler, class file format, class file verifier, interpreter, and just in time compiler would have to be extended. This would be particularly difficult, because multidimensional arrays of different dimension counts would have an incompatible memory layout (because the size of their dimensions must be stored to enable bounds checking), and can therefore not be subtypes of each other. As a consequence, the methods of class java.util.Arrays would likely have to be duplicated for each dimension count, as would all otherwise polymorphic algorithms working with arrays.

To conclude, extending Java to support multidimensional arrays would offer negligible performance gain for most programs, but require non-trivial extensions to its type system, compiler and runtime environment. Introducing them would therefore have been at odds with the design goals of the Java programming language, specifically that it be simple.

meriton
  • 68,356
  • 14
  • 108
  • 175
  • Ah, you've changed the type of `total` (the running counter) from `long` to `int`. Bizarrely this seems to be responsible for dramatically reducing the time difference! Why on earth would that be the case?! I've added my benchmarking code into the question. – chiastic-security Oct 11 '14 at 22:38
  • In fact this seems worthy of a new question, which I will write up tomorrow. (I have no idea why your answer got downvoted, by the way, but it wasn't me. In fact it looks like someone's just downvoted all the answers to this question.) – chiastic-security Oct 11 '14 at 22:41
  • 1
    Fixed, I now use a `long` as well. This does not significantly affect the measurements. – meriton Oct 11 '14 at 22:49
  • That is very odd, because it makes a huge difference to me (exactly the same output of `java -version` as you, running on Fedora). – chiastic-security Oct 11 '14 at 22:53
  • 2
    If I use your testing harness, and reduce iterations to 20000, I get output similar to my harness. If I set iterations to 100000, I get output similar to what you see. If I then change order of the benchmarks in the main method, I again get output similar to my harness. The large differences you saw are therefore likely an artifact of bad decisions by the just in time compiler, which is triggered for the main method before the second loop has been reaches, and therefore before the interpreter could gather any statistics about that loop. – meriton Oct 11 '14 at 23:07
  • 1
    @chiastic-security: "`long` slower than `int`" happens pretty often in Java when you're testing a simple loop. See http://stackoverflow.com/questions/19844048/why-is-long-slower-than-int-in-x64-java/ --- the JIT unrolls loops with `int` induction variables more aggressively than loops with `long` induction variables. – tmyklebu Oct 12 '14 at 03:41
  • It was definitely a bad idea to have both bencmarks in the same `main` method. Without On-Stack Replacement this wouldn't have worked at all. This should be done with `jmh` . – Marko Topolnik Oct 12 '14 at 06:36
3

Since this question is to a great extent about performance, let me contribute a proper JMH-based benchmark. I have also changed some things to make your example both simpler and the performance edge more prominent.

In my case I compare a 1D array with a 2D-array, and use a very short inner dimension. This is the worst case for the cache.

I have tried with both long and int accumulator and saw no difference between them. I submit the version with int.

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(X*Y)
@Warmup(iterations = 30, time = 100, timeUnit=MILLISECONDS)
@Measurement(iterations = 5, time = 1000, timeUnit=MILLISECONDS)
@State(Scope.Thread)
@Threads(1)
@Fork(1)
public class Measure
{
  static final int X = 100_000, Y = 10;
  private final int[] single = new int[X*Y];
  private final int[][] multi = new int[X][Y];

  @Setup public void setup() {
    final ThreadLocalRandom rnd = ThreadLocalRandom.current();
    for (int i=0; i<single.length; i++) single[i] = rnd.nextInt();
    for (int i=0; i<multi.length; i++)
      for (int j=0; j<multi[0].length; j++)
          multi[i][j] = rnd.nextInt();
  }

  @Benchmark public long sumSingle() { return sumSingle(single); }

  @Benchmark public long sumMulti() { return sumMulti(multi); }

  public static long sumSingle(int[] arr) {
    int total = 0;
    for (int i=0; i<arr.length; i++)
      total+=arr[i];
    return total;
  }

  public static long sumMulti(int[][] arr) {
    int total = 0;
    for (int i=0; i<arr.length; i++)
      for (int j=0; j<arr[0].length; j++)
          total+=arr[i][j];
    return total;
  }
}

The difference in performance is larger than what you have measured:

Benchmark                Mode  Samples  Score  Score error  Units
o.s.Measure.sumMulti     avgt        5  1,356        0,121  ns/op
o.s.Measure.sumSingle    avgt        5  0,421        0,018  ns/op

That's a factor above three. (Note that the timing is reported per array element.)

I also note that there is no warmup involved: the first 100 ms are as fast as the rest. Apparently this is such a simple task that the interpreter already does all it takes to make it optimal.

Update

Changing sumMulti's inner loop to

      for (int j=0; j<arr[i].length; j++)
          total+=arr[i][j];

(note arr[i].length) resulted in a significant speedup, as predicted by maaartinus. Using arr[0].length makes it impossible to eliminate the index range check. Now the results are as follows:

Benchmark                Mode  Samples  Score   Error  Units
o.s.Measure.sumMulti     avgt        5  0,992 ± 0,066  ns/op
o.s.Measure.sumSingle    avgt        5  0,424 ± 0,046  ns/op
Community
  • 1
  • 1
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • 2
    By using `arr[0].length` instead of `arr[i].length`, you may force some array bounds checking... at least one additional check is needed. – maaartinus Oct 12 '14 at 13:33
  • I was looking to keep OP's methods intact---but you are right, the main point is comparing array performance head-to-head. Changing this meant a significant speedup: now the factor is just 2x. – Marko Topolnik Oct 12 '14 at 14:07
  • Interesting... The sizes are very asymmetrical, have you tried swapping them? This should help I lot, I guess. Sure, this leads to the endless question, what we want to measure... – maaartinus Oct 12 '14 at 14:10
  • This is deliberate (as I explain in the introduction): my goal is to measure the worst case, thus providing the upper bound on the penalty of cache-adversity. – Marko Topolnik Oct 12 '14 at 14:12
  • Probably you'd like even greater array to kick it out of L2. The greatest penalty for having many short arrays is increased amount of derefencing, the lack of prefetch and possibly re-arranging the memory layout. Anyways, it only makes sense to turn the problem at 90degrees and swap X and Y - but as you said the benchmark has to demonstrate the worst case scenario. – bestsss Oct 12 '14 at 20:53
1

If you want a fast implementation of a true multi-dimentional array you could write a custom implementation like this. But you are right... it is not as crisp as the array notation. Although, a neat implementation could be quite friendly.

public class MyArray{
    private int rows = 0;
    private int cols = 0;
    String[] backingArray = null;
    public MyArray(int rows, int cols){
        this.rows = rows;
        this.cols = cols;
        backingArray  = new String[rows*cols];
    }
    public String get(int row, int col){
        return backingArray[row*cols + col];
    }
    ... setters and other stuff
}

Why is it not the default implementation?

The designers of Java probably had to decide how the default notation of the usual C array syntax would behave. They had a single array notation which could either implement arrays-of-arrays or true multi-dimentional arrays.

I think early Java designers were really concerned with Java being safe. Lot of decisions seem to have been taken to make it difficult for the average programmer(or a good programmer on a bad day) to not mess up something . With true multi-dimensional arrays, it is easier for users to waste large chunks of memory by allocating blocks where they are not useful.

Also, from Java's embedded systems roots, they probably found that it was more likely to find pieces of memory to allocate rather than large chunks of memory required for true multi-dimentional objects.

Of course, the flip side is that places where multi-dimensional arrays really make sense suffer. And you are forced to use a library and messy looking code to get your work done.

Why is it still not included in the language?

Even today, true multi-dimensional arrays are a risk from the the point of view of possible of memory wastage/misuse.

Teddy
  • 4,009
  • 2
  • 33
  • 55
  • This was covered by the question: "Secondly, because it's slow, it thereby encourages obscure coding. If you encounter something performance-critical that would be naturally done with a multidimensional array, you have an incentive to write it as a flat array, even if that makes the unnatural and hard to read." – Marko Topolnik Oct 11 '14 at 21:40
  • 2
    @MarkoTopolnik after nearly half a century of being commonly done, this doesn't count as "obscure" any more. – Jon Hanna Oct 11 '14 at 21:49
  • 1
    It's not "unnatural and hard to read" if it's properly encapsulated. – Kevin Krumwiede Oct 11 '14 at 21:59
  • I suppose the question is whether this will end up being more efficient once you add the overheads of getters etc. I'd hope the JIT compiler would inline such things so that the overheads disappear, but I'm not certain either way. – chiastic-security Oct 11 '14 at 22:12
  • This gets you an object encapsulating a two-dimensional array, which isn't really the same thing as a two-dimensional array. – tmyklebu Oct 11 '14 at 22:13
  • @JonHanna Don't address me, I am quoting OP. BTW I am interested in your "half a century" remark---where was this needed in 1964? – Marko Topolnik Oct 12 '14 at 06:28
  • 1
    @chiastic-security Sure, all getters, setters, and many more trivial stuff get inlined, when the code gets hot, at least on a normal JVM. I'm not sure about Android, but some tools do it at compile time. – maaartinus Oct 12 '14 at 08:17
  • 1
    @MarkoTopolnik I said "nearly" half a century because I was thinking of C in 1972, though really the same technique would have been used in other languages before then so really it would be more than half a century, though I'm not sure just when having object-size handled for you (like C, in Java you only have to do `row*cols + col`, not `(row*cols + col) * size_el`) so I won't push the claim past "nearly half a century". – Jon Hanna Oct 12 '14 at 11:56
  • @JonHanna But in C you *don't* need to do the ugly indexing arithmetic explicitly, that's exactly the point of OP's question. Almost all languages of the era supported higher-ranked arrays first-class. Performance was a much bigger deal then, after all. – Marko Topolnik Oct 12 '14 at 12:07
  • @MarkoTopolnik Yes you do, if it's not statically allocated and you want it to be within a contiguous block of memory. You do remind me that I did neglect the statically allocated case in my own answer though. – Jon Hanna Oct 12 '14 at 13:44
  • @JonHanna Yes, I was thinking of static arrays, but you are right that we should compare Java array with dynamically allocated array in C. C at least supports this: `void f (int ary[][10], int rowCount);` for dynamic arrays. – Marko Topolnik Oct 12 '14 at 14:03
  • @MarkoTopolnik custom indexing math for nary array in the middle of the code is very smelly. You can make a function do it, but it's still smelly, you can put it in a class as a static member, but then you introduce a dependency into your code forcing jumps between files to be inspected. I'm sure a program that generated arbitrary bugged programs with indexing multidimensional arrays would make for a great bad time simulator. – Dmytro Apr 03 '17 at 20:42
  • What do you mean by "With true multi-dimensional arrays, it is easier for users to waste large chunks of memory by allocating blocks where they are not useful"? I allocated a multidimensional array here https://ideone.com/qgeKDC, and there is memory allocated for `arr[0]`. Each element of `arr[0]` is 0. If you replace `int` with `Integer`, `arr[0]` will contain nulls – Maksim Dmitriev Apr 25 '17 at 08:45
  • 1
    @MaksimDmitriev You have a point. It is possible to do wasteful allocation even in current implementation. – Teddy Apr 25 '17 at 09:48