I have really large file with approximately 15 million entries. Each line in the file contains a single string (call it key).
I need to find the duplicate entries in the file using java. I tried to use a hashmap and detect duplicate entries. Apparently that approach is throwing me a "java.lang.OutOfMemoryError: Java heap space" error.
How can I solve this problem?
I think I could increase the heap space and try it, but I wanted to know if there are better efficient solutions without having to tweak the heap space.