6

I know this question has been asked again and again in stackoverflow and google, but I find that all the answers cannot satisfy me. Most of the solutions assume that the whole index can fit in memory, then we can store it to the disk by Java serialization. When the index is needed, we must load whole index to the memory. Solutions like this: solution 1, solution 2. But as we know, this assumption is not always true, so what should I do to store the inverted document index to the disk when it doesn't fit to the memory?

I will appreciate it if you can give me the solution in Java.

amit
  • 175,853
  • 27
  • 231
  • 333
jerry_sjtu
  • 5,216
  • 8
  • 29
  • 42
  • How is your structure implemented? Is the terms in the index also too large to store or only the document lists? Do you want to keep memory usage close to zero or have a structure that keeps "frequent" terms in memory to reduce disk access? All of this will affect how you would store and access the index. – Roger Lindsjö Mar 15 '12 at 13:01

2 Answers2

1

I would try JDBM3 This supports tree and hash collections and the only requirement is that each key or entry fit into memory.

If you have super large entries, I suggest storing each one as files which can be memory mapped to extract portions of the data. In the lookup table you can store keys to file names. (Or make the files names the keys)

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • An inverted index needs to support multiple values per key. This is hardly possible with MapDB, as it states at the docs: "Multimap is a Map which associates multiple values with a single key. [...] It can be written as `Map>`, but that does not work well in MapDB, we need keys and values to be immutable, and List is not immutable." – rudi Aug 20 '19 at 10:56
0

An update after some years.

JDBM3 is no longer supported. MapDB is its replacement. It has several may to store data (Memory maps, etc...) that will meet your requirement.

Akita
  • 287
  • 2
  • 8