How can I store the inverted document index on a disk?

Question

I know this question has been asked again and again in stackoverflow and google, but I find that all the answers cannot satisfy me. Most of the solutions assume that the whole index can fit in memory, then we can store it to the disk by Java serialization. When the index is needed, we must load whole index to the memory. Solutions like this: solution 1, solution 2. But as we know, this assumption is not always true, so what should I do to store the inverted document index to the disk when it doesn't fit to the memory?

I will appreciate it if you can give me the solution in Java.

How is your structure implemented? Is the terms in the index also too large to store or only the document lists? Do you want to keep memory usage close to zero or have a structure that keeps "frequent" terms in memory to reduce disk access? All of this will affect how you would store and access the index. — Roger Lindsjö, Mar 15 '12 at 13:01

score 1 · Accepted Answer · answered Mar 15 '12 at 13:17

1

I would try JDBM3 This supports tree and hash collections and the only requirement is that each key or entry fit into memory.

If you have super large entries, I suggest storing each one as files which can be memory mapped to extract portions of the data. In the lookup table you can store keys to file names. (Or make the files names the keys)

answered Mar 15 '12 at 13:17

Peter Lawrey

525,659
79
751
1,130

An inverted index needs to support multiple values per key. This is hardly possible with MapDB, as it states at the docs: "Multimap is a Map which associates multiple values with a single key. [...] It can be written as `Map>`, but that does not work well in MapDB, we need keys and values to be immutable, and List is not immutable." – rudi Aug 20 '19 at 10:56

score 0 · Answer 2 · answered Feb 04 '19 at 14:06

0

An update after some years.

JDBM3 is no longer supported. MapDB is its replacement. It has several may to store data (Memory maps, etc...) that will meet your requirement.

answered Feb 04 '19 at 14:06

Akita

287
2
8

How can I store the inverted document index on a disk?

2 Answers2