0

I have been learning few basic memory related concepts.

Let's say if I have more then 3 GB data to sort, then is it possible to sort it on 32 bit system or 32 bit JVM.

This Heap has to completely reside in RAM or it can reside in hard disk as well?

trincot
  • 317,000
  • 35
  • 244
  • 286
Onki
  • 1,879
  • 6
  • 38
  • 58
  • 1
    Review [this](http://stackoverflow.com/questions/2457514/understanding-max-jvm-heap-size-32bit-vs-64bit) regarding maximum heap size. – PM 77-1 Jun 21 '15 at 19:40
  • possible duplicate of [How do I sort very large files](http://stackoverflow.com/questions/7918060/how-do-i-sort-very-large-files) – Jim Mischel Jun 23 '15 at 13:51

1 Answers1

2

You can definitely sort more than 3GB of data in a 32-bits system.

The trick is to choose a sort algorithm that doesn't need to allocate the entire array in memory at the same time. One way to achieve this is by using something like Bucket Sort, External Merge sort, or a divide-and-conquer sort algorithm that doesn't require loading the whole data set in memory at once.

In general, divide-and-conquer algorithms work by splitting the original data (say that you have 40 GB of data) into smaller segments (e.g. 1 GB each), sort each segment individually and then merge these segments until the data is completely sorted.

Check this post for a links to similar sort algorithms.

Community
  • 1
  • 1
Mike Laren
  • 8,028
  • 17
  • 51
  • 70
  • You might want to look up the definition of a [divide and conquer algorithm](https://en.wikipedia.org/wiki/Divide_and_conquer_algorithms). What you describe is an [external sort](https://en.wikipedia.org/wiki/External_sorting). Quicksort is an example of a divide and conquer algorithm, and it requires the entire array to be in memory if you want it to be efficient. Furthermore, you do not "merge segments one by one." Instead, you merge all of the segments in a single pass using a M-way merge algorithm. – Jim Mischel Jun 23 '15 at 13:46
  • I agree with your points. The intent of my post was not to imply that any divide-and-conquer algorithm would solve the problem but to suggest looking at divide-and-conquer algorithms (like external merge sort) as possible solutions that would do this kinf of sorting on 32-bit JVMs. I'll edit the post to clarify this! – Mike Laren Jun 23 '15 at 18:50
  • But you didn't address either of my points. First, an external sort is not a divide and conquer algorithm. Second, you don't merge segments one-by-one. The only helpful parts of this post are your first and last sentences. The two middle paragraphs are at best confusing and in some places flat wrong. – Jim Mischel Jun 23 '15 at 18:55
  • I think we have different opinions regarding external sort being a divide and conquer algorithm. The typical external sort is a variant of merge sort (which is inherently a divide and conquer algorithm) where the original data is divided in multiple segments, which are sorted individually and later merged to form the sorted result. The one and only aspect that makes it "external" is that segments that are not being worked on are offloaded to external storage and reloaded when necessary. Which part makes it not fall under the DaQ category? – Mike Laren Jun 23 '15 at 19:39
  • The first sentence of the Divide and Conquer Wikipedia topic (I linked it in my previous comment) says: "In computer science, divide and conquer (D&C) is an algorithm design paradigm based on multi-branched recursion." An external sort of the type that you describe is not such an algorithm. And the typical external sort is a *much* different thing than a merge sort. – Jim Mischel Jun 23 '15 at 20:58