1) Benchmark your application as a whole. Don't assume that you know where the perf bottlenecks in your application are. Experience shows again and again and again that humans generally suck at this. Do this on hardware and systems which are identical to production, or you're wasting your time.
2) Don't forget to structure your benchmark in such a way that the JIT compiler has kicked in for the code you care about. 10000 iterations of a method are typically needed before a method is compiled. Benchmarking interpreted-mode code is a total waste of time.
3) In an application where the most significant bottlenecks have been dealt with, many applications will be in a state where the performance profile is dominated by the number of processor L1 cache misses. You can regard this as being the point at which your application is reasonably well-tuned. Your algorithms may still suck however, and there may still be loads of busywork going on in the system that you can get rid of.
4) Assuming that your algorithms don't suck and that you have no major chunks of busywork that you can get rid of, if the array / List difference is truly significant for you then it's at this point that you'll start to see it in the perf numbers.
5) Under most circumstances, you will find that the L1 cache situation will be better for arrays than for lists. However, this is general advice, not to be mistaken for actual performance tuning advice. Generate your own perf numbers and analyse them.
tl;dr version: Read the long version. tl;dr has no place in Java performance discussion - this is subtle and complex stuff and the nuances matter.