7

Possible Duplicate:
Are they any decent on-disk implementations of Java's Map?

I have a piece of code (that I didn't write) that reads millions of CSV rows to a Map, then processes it.

I got to the point where I simply ran out of RAM

My options are

  1. Rewrite the code, trying to stream the data, however since some calculations might need the entire data set (e.g. calculation that might need both the very first and very last row in the data set)

  2. Write a Class that implements java.util.Map but will persist the data into a database

  3. Simply rewrite the code and insert / select from a database directly, but I'd rather try #2 first

So the thought of a DB backed Map all of a sudden made sense to me, so before starting to write it, I wanted to ask if there is a well known pattern / implementation for this problem (perhaps not even a Map)

Now as much as I like writing code, I don't like reinventing things, and I prefer reusing open source code.

I don't mind much about the storage implementation, SQL or NoSQL, but it needs to allow a Map to be automatically persistent, and avoid keeping it entirely in memory.

Is there such a known library / implementation? is this problem familiar? am I attacking it in the right way?


Update:

based on comments, I'll look into these (older, but pretty much duplicate) questions:

and vote to close this one if they answer my question and still up to date

Update2:

  1. This is not an exact duplicate, I'm looking for a database backed persistence, the other questions are wider (any disk based implementation)
  2. Duplicates are not always a bad thing, please read this post by Jeff Atwood before voting to close
Community
  • 1
  • 1
Eran Medan
  • 44,555
  • 61
  • 184
  • 276
  • Oh, a tied hash. What a great idea! – Paul Tomblin Sep 12 '12 at 16:00
  • 1
    If you need more RAM, perhaps that is what you should do. Streaming it to a database can be 10-1000x slower. – Peter Lawrey Sep 12 '12 at 16:01
  • @PaulTomblin - really? or was it with a hint of sarcasm? :) – Eran Medan Sep 12 '12 at 16:02
  • Check out this question and see if it answers your needs. http://stackoverflow.com/questions/4815633/are-they-any-decent-on-disk-implementations-of-javas-map I'm not voting to close this question as a duplicate (for now) because almost 2 years have passed. – Gilbert Le Blanc Sep 12 '12 at 16:02
  • Have you seen [this](http://stackoverflow.com/questions/2654709/disk-based-hashmap) question? – ElderMael Sep 12 '12 at 16:02
  • @mael - no, and I did look, just not enough. I can vote to close my own question if you think it's a duplicate, though the new "dupes are ok" guidelines might be easy on this one, let me find that link... – Eran Medan Sep 12 '12 at 16:03
  • @GilbertLeBlanc I'll check both questions, if they are still relevant, I'll vote to close this one, thanks – Eran Medan Sep 12 '12 at 16:07
  • 1
    No sarcasm. I can think of a couple of places I could have used it. It's damn handy in Perl. – Paul Tomblin Sep 12 '12 at 16:08
  • I'll post this on Meta - but I think a "Question Ring" might be a nice idea (probably not new) so similar questions can be grouped into one "folder" (each one is an alias) so you can view ALL answers to the same question / duplicates, and see which one is the latest / have highest votes (and you can see multiple accepted answers too) – Eran Medan Sep 12 '12 at 16:09
  • 1
    Well, I believe this is not a duplicate but I think the question is related. – ElderMael Sep 12 '12 at 16:11
  • Before voting to close, please read this: http://blog.stackoverflow.com/2010/11/dr-strangedupe-or-how-i-learned-to-stop-worrying-and-love-duplication/ – Eran Medan Sep 12 '12 at 17:01
  • @PaulTomblin I don't think this is not an exact duplicate, disk based vs database based is not the same, a database might not be disk based (might be in memory, peer to peer cache only data replication), and disk based might not be a database. please explain why it's an EXACT duplicate. – Eran Medan Sep 14 '12 at 20:26
  • @PaulTomblin Another thing, even if the question is an exact duplicate, it doesn't have the same answers. Actually the answer I got here, is much more up to date and relevant than the suggested duplicate. I respect the decision, just wanted to hear your opinion on this. – Eran Medan Sep 14 '12 at 20:34
  • Since the problem as stated was lack of memory, your remark about an in memory database is a little specious, don't you think? I'm reminded about the old quote about using virtual memory to make a really big ram disk. – Paul Tomblin Sep 14 '12 at 20:42
  • http://meta.stackexchange.com/questions/147643/should-i-vote-to-close-a-duplicate-question-even-though-its-much-newer-and-ha – Eran Medan Sep 19 '12 at 20:06

1 Answers1

2

Many key-value stores provide Map interface. For example, https://github.com/jankotek/JDBM3

See also SO questions:

key-value store suggestion

Java disk-based key-value storage

Community
  • 1
  • 1
Alexei Kaigorodov
  • 13,189
  • 1
  • 21
  • 38