1

Following is the scenario which i had done couple of times ..

Count the frequency of words in a paragraph.

I create a Map and store the count. SO my map contains

  <Today, 10>
  <the, 123>
  <hello,1>
  <dont, 20>

Now the other scenario comes , identifying words with count 100 or 30

I create a map of list or map of

<10, [today,...]>
   <123,[the,...]>

or <10, 2> <123,1> Basically I have two maps to handle all the work.. This works fine and any update on one , the other has to be updated.

The retrieve and inserting time is almost O(1). But this is not that memory efficient.

What other approaches can be used ?

Belphegor
  • 4,456
  • 11
  • 34
  • 59
harshit
  • 7,925
  • 23
  • 70
  • 97
  • 2
    bi-directional maps are the datastructures for this approach: http://stackoverflow.com/questions/1670038/does-java-have-a-hashmap-with-reverse-lookup – Andreas Dolk Jan 08 '13 at 23:03
  • I would recommend bidirectional maps as well. – DarthVader Jan 08 '13 at 23:05
  • 3
    @Andreas_D - Not the same thing. Bidirectional maps require that the values as well as the keys be unique. It cannot be used when, for instance, two words have the same frequency. – Ted Hopp Jan 08 '13 at 23:06
  • Ted, yes, you're right. BiMaps won't work for this task. – Andreas Dolk Jan 08 '13 at 23:07
  • Is this really a big memory problem? After all, `String`s are immutable and reusable, you don't have to carry around a second copy of every word just because you hav a second map, no? Or are you worried about the memory overhead of the second map structure itself? – us2012 Jan 08 '13 at 23:16
  • nope its not a big problem, just trying to figure out something better .. I know memory is aint that big issue these days. – harshit Jan 09 '13 at 00:02

1 Answers1

1

Once you have your (word, frequency) pairs, you can create an array of them, sort by frequency, and do a binary search. This will slow down the access to O(log n) but you can get by with about half the memory, if that's the constraint.

Other than that, I don't see anything better than what you're already doing.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521