2

I need to store keys as case insensitive, and all values for keys like STATE/state/State are merged into one Set. However the catch is I need the case sensitive version of the original key back at some point so a generic CaseInsensitiveMap doesn't work. I only need back the first capitalization of 'state' added, so in this case I keep STATE and discard state/State.

I've looked at a few options for implementing this data structure, like Guava HashMultimap and Tuples, but none seem quite right.

<CaseInsensitiveOriginalKey, OriginalKey, Set<Values>>

So for example if I add a key 'State' with values {Texas, Oklahoma} it will be stored as:

<state, State, {Texas, Oklahoma}>

The idea being if I create some kind of .add(StATe, {Nebraska}) then the map, seeing a case-insensitive entry for 'state' already exists, becomes:

<state, State, {Texas, Oklahoma, Nebraska}>

and for a new key, .add(COLOR, {blue, red})

The overall map becomes:

<state, State, {Texas, Oklahoma, Nebraska}>
<color, COLOR, {blue, red}>
  • .get(ColoR) returns {red, blue}
  • .getKey(coLOR) returns COLOR

Any ideas on how to best accomplish this?

Community
  • 1
  • 1
xref
  • 1,707
  • 5
  • 19
  • 41

3 Answers3

2

You can maintain two maps:

  • One is a Map<String, Set<String>> that maps the case-insensitive key to the corresponding set of strings (e.g. "state" → {"Texas", "Oklahoma"}).

  • The other is a Map<String, String> that maps the case-insensitive key to its corresponding case-sensitive key (e.g. "state" → "State").

You can create your own class that has these two maps as private fields and ensures that they are kept in sync whenever a pairing is added/removed/updated.

arshajii
  • 127,459
  • 24
  • 238
  • 287
  • That was one of the suggestions in the link above, but would probably require some structure to make sure the maps don't get out of sync, maybe a custom class that extends HashMap in some fashion? – xref Aug 08 '13 at 23:35
  • What if "state" needed to map to "State", "STATE", and "STate"? – Ted Hopp Aug 08 '13 at 23:35
  • @xref Yes, you can create your own class that hides the maps and ensures they are kept in sync whenever you add/remove pairings. – arshajii Aug 08 '13 at 23:35
  • @TedHopp The way I understand it, there should only be one case-insensitive to case-sensitive pairing for each word. Maybe I'm mistaken? – arshajii Aug 08 '13 at 23:36
  • @TedHopp in my case anyway the first key wins. Ultimately it will be the header of a csv column. There can only be one column for 'StAtE' and it will be the first version of 'StatE' added to the Map. calling .get(STate) will return {Texas, Oklahoma, Nebraska} and .getKey(STAte) would return 'State' from the example above. – xref Aug 08 '13 at 23:38
  • I think you have it backwards. OP wants to put a value using "State" and retrieve the same value using "STATE" as the key. – Ted Hopp Aug 08 '13 at 23:38
  • @TedHopp Right, he can do that with the case-insensitive map. – arshajii Aug 08 '13 at 23:39
  • I just don't get it. How is this going to work with the keys "state" and "sTaTe"? How is OP going to recover both keys? – Ted Hopp Aug 08 '13 at 23:47
  • @TedHopp The OP wants a case-insensitive pairing between `"state"` and some set. He can store all the keys in lower-case (for example) and use `map.get(key.toLowerCase())`. This is the job of the first map I described. Is that what you meant? – arshajii Aug 08 '13 at 23:51
  • That part I see. It's the second part - how is your data structure going to remember the case-sensitive keys? (_"I need the case sensitive version of the original key back at some point"_.) The second map only stores a single case-sensitive key for each case-insensitive key. – Ted Hopp Aug 08 '13 at 23:54
  • 1
    @TedHopp actually I'll only need a single version of 'State', whatever makes it into the map first. I won't have to recover a later addition of 'StAtE', the values will be merged into the first map and the key 'StAtE' will be discarded – xref Aug 08 '13 at 23:57
2

What you need is something like Map<CaseInsensitiveOriginalKey, Record> where Record is a custom class with the original (case-sensitive) key and the set of values as attributes.

You could get away with using a generic Pair class instead of a custom Record class, but (IMO) that would be poor design.


However, there is a problem with your requirements:

However the catch is I need the case sensitive version of the original key back ...

Your examples indicate that you could have multiple case sensitive versions of the original key; i.e. the one that you saw first (e.g. "State") and subsequent ones (e.g. "STate", "state", etc). So which is the correct original key to use? And what about the case where the first one you saw was ... erm ... junky?

The point is that treating the first version that you saw as the definitive / preferred one is going to be problematic. You need something (or someone) to figure out the definitive version intelligently. To do that you probably need to keep all of the versions that you saw until (at least) you completed the initial data capture phase. You may even need to keep their frequencies and/or their contexts.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Good idea about the frequencies/contexts. There is a preferred capitalization which will always be added to the map first if it exists, otherwise yeah, it's thunderdome and the first key in wins. – xref Aug 09 '13 at 00:05
0

I'd suggest a data structure that has a couple of maps. One is a map from each (case-sensitive) key to the case-insensitive key and the other is a map from the case-insensitive key to the value. Given a case-sensitive key, each access would be a two-step affair: find the case-insensitive key to use from the first map and then use the key with the second map.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521