0

Edit: Okay my question has been answered. Thank you. Initially I had doubts about using an array of 1 million cause I read it caused some problems in C, so thanks for your responses everyone!

Okay hi guys, I have a school assignment where I have to code a binary search to search for a piece of data in a set of data that may be up to 1 million in size.

I'm planning to just stick to numbers so the binary searching itself should be pretty easy. The data will simply be tons of randomly generated numbers (sorted) onto a text file and I plan to get the program to open the file and load all the data onto the array.

However up to now I've simply been using array sizes of up to several hundred. So here's my question: Would it be practical to declare an array of 1 million?

If it's not practical to have an array size of 1 million then what would you guys suggest? Do I have to split up the data into multiple files and have a smaller array size of say, 10,000? Or is there another data type besides arrays that I could use?

Would greatly appreciate any helpful responses, thanks!

PS: I'm coding in Java.

Community
  • 1
  • 1
Chrystle Soh
  • 89
  • 1
  • 1
  • 5
  • You could [sort separate chunks of the file](http://stackoverflow.com/questions/8832822/sorting-lines-of-an-enormous-file-txt-in-java) and perform a binary search on the file rather than storing all of it in an array (or a Collection of your choice). But what's the point? It's not taking up _that_ much memory, so you really shouldn't have to worry about it (i.e. doing anything else is overcomplicating things). – sgbj Sep 20 '13 at 05:57

7 Answers7

1

the maximum size of an array you can set is Integer.MAX_VALUE - 5. the memory address index is 32bit and there is an object header+length, so they still need to be addressed by that 32bit index

refer this post stackoverflowquestion

if the numbers that you sort falls inside a specific range of values then you can refer this table

byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). The byte data type can be useful for saving memory in large arrays, where the memory savings actually matters. They can also be used in place of int where their limits help to clarify your code; the fact that a variable's range is limited can serve as a form of documentation.

short: The short data type is a 16-bit signed two's complement integer. It has a minimum value of -32,768 and a maximum value of 32,767 (inclusive). As with byte, the same guidelines apply: you can use a short to save memory in large arrays, in situations where the memory savings actually matters.

int: The int data type is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive). For integral values, this data type is generally the default choice unless there is a reason (like the above) to choose something else. This data type will most likely be large enough for the numbers your program will use, but if you need a wider range of values, use long instead.

long: The long data type is a 64-bit signed two's complement integer. It has a minimum value of -9,223,372,036,854,775,808 and a maximum value of 9,223,372,036,854,775,807 (inclusive). Use this data type when you need a range of values wider than those provided by int.

Src: java docs

Community
  • 1
  • 1
Thirumalai Parthasarathi
  • 4,541
  • 1
  • 25
  • 43
  • Isn't the limit Integer.MAX_VALUE - 8? http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/ArrayList.java#ArrayList.0MAX_ARRAY_SIZE – Silviu Burcea Sep 20 '13 at 06:27
  • @SilviuBurcea : it depends upon the JVM. you can take a look at the link i've provided in my answer. There is already a similar question in stack overflow. – Thirumalai Parthasarathi Sep 20 '13 at 10:32
1

Yes, it is totally practical to have an array size of one million. Anything else is just overly complicating things.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
1

If you're going to implement a binary search algorithm, you may consider using a binary search tree. A binary tree can have more efficient searching and sorting than arrays.

Wikipedia article on binary search trees: Binary Search Trees

  • +1 I am still shocked that people are not suggesting BST for such a large amount of data. – Silviu Burcea Sep 20 '13 at 06:25
  • 2
    A binary search of a sorted array is much faster than a binary search of a tree as the references are continuous in memory without loads of padding between. A BST is much better for updates O(log n) vs O(n) an int[] of one million will fit in 4 MB ie in L3 cache. A BST of 1 million will use about 50+ MB ie not in cache. – Peter Lawrey Sep 20 '13 at 06:34
1

For 1 million numbers declaring an array size of 1 millions is fine. Anything else would be unnecessarily complicating.

If you have really huge data then you can go for splitting the data , than sort and binary search. But 1 million it looks overly complicating things.

Trying
  • 14,004
  • 9
  • 70
  • 110
0

The data structure you should use for large sets depends very very much on the data type you're using, which in this case is a number (presumably int) or some such. Primitive arrays in Java are just memory blocks of the size of the variable times the length of the array, just as in C, so if you're using ints (4 bytes) and have a million of them, you'll only be using 4MB of memory for the array, and then you can just use Arrays.sort.

Answers for similar situations where you're sorting objects instead of primitives will depend on lots of variables, such as the size of the objects and whether they're going to be in a database, flat file, or so on.

chrylis -cautiouslyoptimistic-
  • 75,269
  • 21
  • 115
  • 152
0

You could try to use a Binary Tree

kovrik
  • 159
  • 1
  • 4
0

Java should be fine with an array of 1 million elements. The operations you perform on that array could take a long time if you use inefficient algorithms, however binary search should be fine.

Any duplicates can likely be ignored once the first one has been inserted into the binary search tree, and since you're just dealing with numbers (int or long) an array should be fine. Also, with just a little bit of math you can perform any binary tree operations that are needed directly on the elements in the array, with very few temporary variables for swapping entries, and for maintaining the overall number of elements that were used in the array (since all 1 million entries may not be filled).

Ryan
  • 712
  • 7
  • 21