I have a program that takes each item from a list and compares it against all other items in another list. Its been working fine so far but the data is getting large and is going to exceed system memory.
I'm wondering what the best way to compare two lists that are very large(maybe 5-10 GB each list)?
Here is a very simple example of what I'm doing(except the list is huge and the values in the for loop are actually being processed/compared).
import java.util.Collection;
import java.util.HashSet;
import java.util.Arrays;
public class comparelists {
public static void main( String [] args ) {
String[] listOne = {"a","b",
"c","d",
"e","f",
"g","h",
"i","j",
"k","l"};
String[] listTwo = {"one",
"two",
"three",
"four",
"five","six","seven"};
for(int listOneItem=0; listOneItem<listOne.length; listOneItem++){
for (int listTwoItem=0; listTwoItem<listTwo.length; listTwoItem++) {
System.out.println(listOne[listOneItem] + " " + listTwo[listTwoItem]);
}
}
}
}
I realize there has to be some disk IO here since it won't fit in memory and my intial approach was to save both lists as files and save a bunch of lines from listOne then stream the entire file of listTwo and then get some more lines from listOne and so on. Is there a better way? or a Java way to access the lists like I'm doing above but its swapping to disk as needed?