21

I 've run into a scenario where I want to lowercase all the keys of a HashMap (don't ask why, I just have to do this). The HashMap has some millions of entries.

At first, I thought I 'd just create a new Map, iterate over the entries of the map that is to be lowercased, and add the respective values. This task should run only once per day or something like that, so I thought I could bare this.

Map<String, Long> lowerCaseMap = new HashMap<>(myMap.size());
for (Map.Entry<String, Long> entry : myMap.entrySet()) {
   lowerCaseMap.put(entry.getKey().toLowerCase(), entry.getValue());
}

this, however, caused some OutOfMemory errors when my server was overloaded during this one time that I was about to copy the Map.

Now my question is, how can I accomplish this task with the smallest memory footprint?

Would removing each key after lowercased - added to the new Map help?

Could I utilize java8 streams to make this faster? (e.g something like this)

Map<String, Long> lowerCaseMap = myMap.entrySet().parallelStream().collect(Collectors.toMap(entry -> entry.getKey().toLowerCase(), Map.Entry::getValue));

Update It seems that it's a Collections.unmodifiableMap so I don't have the option of

removing each key after lowercased - added to the new Map

Holger
  • 285,553
  • 42
  • 434
  • 765
sestus
  • 1,875
  • 2
  • 16
  • 16
  • 3
    Can't you insert the keys lower-cased in the first place? – Eran Dec 19 '16 at 14:43
  • 2
    No... I use an API to get this Map - it's not my code. – sestus Dec 19 '16 at 14:44
  • Is the case sensitivity of the original keys should be kept ? – davidxxx Dec 19 '16 at 14:46
  • @davidxxx no, not really. I just need the lowercased keys. I think my only option is, as BackSlash said, removing - then adding again. – sestus Dec 19 '16 at 14:48
  • Or just add a second (lowercased) key to the same object. – Mark Rotteveel Dec 19 '16 at 14:48
  • 3
    What you're trying to do is dangerous because in the original `Map` there can be a `key` and `Key`. How should your map behave then? – xenteros Dec 19 '16 at 14:48
  • 4
    a) no, this is unlikely to benefit from parallel processing, b) parallel processing doesn’t solve your memory issue, c) since you already answered the `remove` option yourself, there is only one solution: add more heap memory. – Holger Dec 19 '16 at 14:56
  • I agree with @Holger. Considering you can't edit the original map, you're going to have to create a new one, which will lead to the `OutOfMemory` exception. – Luke Melaia Dec 19 '16 at 14:58
  • @Holger ofc parallel processing won't solve the memory issue. But I think I ll benefit from parallel processing in speeding up the copy. And besides that, it's easy so why not make it parallel? – sestus Dec 19 '16 at 15:00
  • @sestus is the original Map can be used during the task of retrieving all lowercase keys and performing a related processing ? – davidxxx Dec 19 '16 at 15:04
  • @davidxxx no, the original map is an unmodifiable map. – sestus Dec 19 '16 at 15:04
  • 3
    @sestus: No, this will *not* benefit from parallel processing. Explaining the reasons would exceed the scope of this question. Besically, the standard `toMap` collector has to *merge* partial results and the merging costs are as high as any savings. The `String.toLowerCase()` operation simply is too cheap to compensate the overhead. – Holger Dec 19 '16 at 15:05
  • 5
    I really, really think that you should tell us the reason why you want to do this. This might be a case of "my fingernails are too long so I have to cut my hands off (don't ask why I just have to do this)". Maybe there's a better solution to your problem than lowercasing your unmodifiable map keys. – walen Dec 19 '16 at 15:06
  • @Holger this makes sense yes... thx! – sestus Dec 19 '16 at 15:06
  • @walen so the situation is as follows. I m using an API -not my code - to get some keys and some values (mixed case). Then I have a list of these same keys but in lowercase, and I need to get the value for every key in that list. Again, I don't have control over the code that generates that list.. – sestus Dec 19 '16 at 15:13
  • Or make full key scan on each lookup :-) and some minor additional caching. That may sound like a joke but consumes almost no memory. – Lachezar Balev Dec 19 '16 at 15:24
  • Can you easily serialize / deserialize the values in the map? – Tamas Rev Dec 19 '16 at 15:28
  • Ok so it is a case of "My fingernails are too long. I am not allowed to cut them. Also my nail trimmer is a chainsaw". Sorry about your hands! – walen Dec 19 '16 at 15:30
  • @walen most probably you are right... I 'd have so many options if I controlled that API code. I could remove the entries, I could wrap in a class and override `hashCode` to ingore case... Now I m stuck.. I think I ll give a try to creating every possible mixed case from the lowercase (thus avoiding the map copy), but this seems so wrong... – sestus Dec 19 '16 at 15:37

4 Answers4

41

Instead of using HashMap, you could try using a TreeMap with case-insensitive ordering. This would avoid the need to create a lower-case version of each key:

Map<String, Long> map = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
map.putAll(myMap);

Once you've constructed this map, put() and get() will behave case-insensitively, so you can save and fetch values using all-lowercase keys. Iterating over keys will return them in their original, possibly upper-case forms.

Here are some similar questions:

Community
  • 1
  • 1
Kenster
  • 23,465
  • 21
  • 80
  • 106
  • that's indeed an interesting approach. It wouldn't change the API it would just require a different instantiation of the Map. I try to utilize it. – sestus Dec 19 '16 at 16:36
  • So it seems that this did the job. The change was trivial - just the Map implementation was changed. Thx! – sestus Dec 20 '16 at 08:47
3

You cannot remove the entry while iterating over the map. You will have a ConcurentModificationException if you try to do this.

As the issue is an OutOfMemoryError, not a performance error, using parallel stream will not help either.

Despite some task on the Stream API will be done lately, this will still lead to have two maps in memory at some point so you will still have the issue.

To workaround it, I only saw two ways :

  • Give more memory to your process (by increasing -Xmx on the Java command line). Memory is cheap these days ;)
  • Split the map and work in chunks : for example you divide the size of the map by ten and you process one chunck at a time and delete the processed entries before processing the new chunk. By this instead of having two times the map in memory you will just have 1.1 times the map.

For the split algorithm, you can try someting like this using the Stream API :

Map<String, String> toMap = new HashMap<>();            
int chunk = fromMap.size() / 10;
for(int i = 1; i<= 10; i++){
    //process the chunk
    List<Entry<String, String>> subEntries = fromMap.entrySet().stream().limit(chunk)
        .collect(Collectors.toList());  

    for(Entry<String, String> entry : subEntries){
        toMap.put(entry.getKey().toLowerCase(), entry.getValue());
        fromMap.remove(entry.getKey());
    }
}
loicmathieu
  • 5,181
  • 26
  • 31
  • 2
    i´m just going to quote @xenteros. If there´s a `key` key and a `Key` key in the map the whole logic of having an only lowercase map would fail and the task wouldn´t make any sense anymore, as the final state shouldn´t ever be achievable (if you don´t just remove one of them) – SomeJavaGuy Dec 19 '16 at 15:03
  • It supposes that the original map is not used during this time. – davidxxx Dec 19 '16 at 15:06
  • 3
    The (updated) question says that the source map isn’t mutable at all (i.e. a map as returned by `Collections.unmodifiableMap`), so it doesn’t matter whether you try to remove chunks or single entries (iterating and removing at the same time works, if you use `Iterator.remove()`). – Holger Dec 19 '16 at 15:09
  • no it's not used. I suppose splitting is my only option. I have already given as much memory as possible to my JVM. Avoiding having to load the full map in memory could fix this issue. – sestus Dec 19 '16 at 15:09
  • Il you cannot add more memory nor be able to delete on the origin map so there is no solutions! You should to make your first map modifable for this to work. regarding the remark of @kevin-esche my algorithm will works in this case, it will just create only one entry for two in the origin map (overiding the entry) I suppose that this functionality is OK regarding the question. – loicmathieu Dec 19 '16 at 15:13
  • I don't understand how this code may work if the `fromMap` was created with `Collections.unmodifiableMap()`. That is a requirement of the OP. `fromMap.remove()` should throw `UnsupportedOperationException` – davidxxx Dec 19 '16 at 15:17
  • 1
    @davidxxx the answer was submitted before this additional information was provided (or at least they were close to each other, so i guess loicmathieu didn´t knew it by than) – SomeJavaGuy Dec 19 '16 at 15:18
  • Just looked. original question edited 25 mins ago and the question 19 mins ago. At last, it is not important. @Kevin Esche – davidxxx Dec 19 '16 at 15:20
0

the concerns in the above answers are correct and you might need to reconsider changing the data structure you are using.

for me, I had a simple map I needed to change its keys to lower case

take a look at my snippet, its a trivial solution and bad at performance

private void convertAllFilterKeysToLowerCase() {
    HashSet keysToRemove = new HashSet();
    getFilters().keySet().forEach(o -> {
        if(!o.equals(((String) o).toLowerCase()))
            keysToRemove.add(o);
    });
    keysToRemove.forEach(o -> getFilters().put(((String) o).toLowerCase(), getFilters().remove(o)));
}
Basheer AL-MOMANI
  • 14,473
  • 9
  • 96
  • 92
-1

Not sure about the memory footprint. If using Kotlin, you can try the following.

val lowerCaseMap = myMap.mapKeys { it.key.toLowerCase() }

https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/map-keys.html

Big Pumpkin
  • 3,907
  • 1
  • 27
  • 18