0

I have a large HashMap <String, List<String>> that I want to save in a file. I don't want to serialize it with Java's default methods because they also store a lot of things I don't need, such as the information of the class (I only want the strings, basically). I also would like to know where each of the keys is stored in the file, so I don't have to lookup the entire file to find it. (the file/hashmap will be too large to keep it all in memory). My idea was to loop through the file and just calculate how many bytes have been used for writing this key and value pair, and storing the exact position of them in a HashMap of the format <String, Long>.

For example, Imagine I have a hashmap

{
"car01":["car", "coche", "macchina", "automobil"],
"dog01": ["dog", "perro", "cane", "cao"]
}

Then the file could be something like

car01[car,coche,macchina,automobil]dog01[dog,perro,cane,cao]

And the index hashmap could be something like

{"car01":0, "dog01":35}

I tried iterating like this:

long characterCount = 0;

    HashMap<String, List<String>> index = indexOfIndexes.get(indexName);
    Path path = Paths.get(outputfile);
    try(Writer writer = Files.newBufferedWriter(path)) {
        index.forEach((key, value) -> {
            try { 
                writer.write(key + value);
            }
            catch (IOException ex) { throw new UncheckedIOException(ex); }
        });
    } catch(UncheckedIOException ex) { throw ex.getCause(); }

But I don't know how to calculate the amount of characters/bytes used efficiently each time.

  • 1
    Use a database. This is like re-inventing the wheel. There are fine lightweight databases that integrate well with Java, like H2, SQLite etc. – RealSkeptic Nov 18 '19 at 17:28
  • You would be absolutely right 99% of the time, but this time I actually have my reasons to do this. Any help would be greatly appreciated! –  Nov 18 '19 at 17:38

2 Answers2

0

I think you can use String's getBytes function to calculate the serialized length. something like:

long characterCount = 0;

HashMap<String, List<String>> index = indexOfIndexes.get(indexName);
Map<String, Long> count= new HashMap<>();
Path path = Paths.get(outputfile);
try(Writer writer = Files.newBufferedWriter(path)) {
    index.forEach((key, value) -> {
        try {
            count.put(key, characterCount);
            writer.write(key + value);
            characterCount= characterCount+  (key+ value).getBytes().length;
        }
        catch (IOException ex) { throw new UncheckedIOException(ex); }
    });
} catch(UncheckedIOException ex) { throw ex.getCause(); }
Haijin
  • 2,561
  • 2
  • 17
  • 30
  • It's probably correct, but I get Local variable characterCount defined in an enclosing scope must be final or effectively final. Do you know what to do about that? –  Nov 19 '19 at 12:16
  • Can you just define an another non-final variable and use it? – Haijin Nov 19 '19 at 16:35
  • No, you can't modify variables inside the lambda like that: https://stackoverflow.com/questions/38402493/local-variable-log-defined-in-an-enclosing-scope-must-be-final-or-effectively-fi –  Nov 20 '19 at 11:33
0

Based on @Haijin 's anwer

    Writer writer = null;

    long characterCount = 0;

    HashMap<String, List<String>> index = indexOfIndexes.get(indexName);
    HashMap<String, Long> count = new HashMap<>();
    Path path = Paths.get(outputfile);

    try {
        writer = new BufferedWriter(new FileWriter(outputfile));
        for (String key : index.keySet()) {
            count.put(key, characterCount);
            writer.write(key + index.get(key));
            characterCount = characterCount +  (key + index.get(key)).getBytes().length;
        }

        characterPositions.put(indexName,count);
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        if (writer != null) try { writer.close(); } catch (IOException ignore) {}
    }