0

My application in Java require a hash table for its calculation and it has to do millions of look up into this hash table. The hash table must be readable from the disk into the HashTable utility very fast and the data within hast table is static and no insert or delete required.

Do you recommend using any available lib to do so?

Also, the size of the data is less than 200MB.

iCode
  • 4,308
  • 10
  • 44
  • 77
  • what are the requirements for the file you're reading/writing to? Does it need to be human readable? – Matt Jul 10 '12 at 01:46

2 Answers2

1

If your data is static, why not just use a plain old array and lookup by index? Whatever key you intended to use, just provide an index attribute. Of course, if you exceed the maximum possible array length you'll need to shard across multiple arrays.

I'd say no hash function can beat direct random access and the cost of assigning indexes over your key set (your "perfect hash function") will be up front, during initialization and not for each lookup.

Community
  • 1
  • 1
Edward Samson
  • 2,395
  • 2
  • 26
  • 39
1

If being human readable is not a requirement, you could gasp resort to just making sure your data implements the Serializable interface and serializing the HashMap using an ObjectOutputStream. It's ugly but it would get the job done.

Another option would be DataInputStream and DataOutputStream. These allow you to read/write structured binary data.

Let's assume you have a HashMap, you could write it like this:

// realOutputStream should probably be a BufferedOutputStream
DataOutputStream output = new DataOutputStream( realOutputStream );
for (Map.Entry<Long, String> entry : map.entrySet()) {
    // Write the key
    output.writeLong(entry.getKey().longValue());
    byte bytes[] = entry.getBytes("UTF-8");
    // Writing the string requires writing the length and then the bytes
    output.writeInt(bytes.length);
    output.write(bytes, 0, bytes.length);
}



// realInputStream should probably be a BufferedInputStream
DataInputStream input = new DataInputStream ( realInputStream );
Map<Long, String> map = new HashMap<Long, String>();
while ( true ) {
   try {
     // read the key
     long key = output.readLong();
     // read the string length in bytes
     int strlen = output.readInt();
     // read the bytes into an array
     byte buf[] = new byte[strlen];
     output.readFully(buf, 0, strlen);
     // Create the map entry.
     map.put(Long.valueOf(key), new String(buf,"UTF-8"));
   }
   catch (EOFException e) {
     // input is exhausted
     break;
   }
}

Keep in mind this is assuming you want to store and read the string as UTF. You could just as easily not supply a character set and use the jvm default encoding. Also notice that something with variable length like a String would require you to write the length of that data first before writing the actual data. This is so you can know how many bytes you need to read in to reconstruct that string.

Matt
  • 11,523
  • 2
  • 23
  • 33