Basically, I need a double-connected Map, which can retrieve value from key and the inverse, I have checked this link, BUT also should be sorted according to the values AND should take multiple values for a single key (I cannot guarantee that there won't be an exact freq
for different keys).
So is there any structure with that criteria out there?
Below is the specific problem that imposed this need (maybe I have something missing while I was implementing this but if you know an answer to the above question then you can probably skip it):
I want to implement a bag-of-words method for some features. The idea is to keep only the top k
bins with the greatest frequency of occurrence.
To make it more specific let's say I have a codebook
double[10000][d] codebook
and a set of features double[][] features
. For each line in features
, which represent a feature I check the distance from each line in codebook
and assign it to the bin having this line a centroid.
Then I increment the index of this bin by 1 until all features have been used.
Then, I would like to keep only the top k
bins as results.
The part I am a bit stuck is the one for keeping only the top-k
bins. I use a BoundedPriorityQueue<Feature>
collection to achieve but I am not sure if there is some simpler approach.
public static BoundedPriorityQueue<Feature> boWquantizerLargerK(double[][] codebook, double[][] features, int featureLength, int maxNumWords) {
HashMap<Integer, Integer> boWMap = new HashMap<Integer, Integer>();
BoundedPriorityQueue<Feature> nn = new BoundedPriorityQueue<Feature>(new Feature(), maxNumWords);
for(int i = 0; i < features.length; i++) {
double[] distCodebook = new double[codebook.length];
for(int j = 0; j < codebook.length; j++) {
double[] dist = new double[featureLength];
for(int k = 0; k < featureLength; k++)
dist[k] = (codebook[j][k] - features[i][k])*(codebook[j][k] - features[i][k]);
distCodebook[j] = MathUtils.sum(dist);
}
Integer index = MathUtils.indexOfMin(distCodebook) + 1;
Integer freq;
if((freq = boWMap.get(index)) == null) {
boWMap.put(index, 1);
nn.offer(new Feature(1, index));
}
else {
boWMap.put(index, ++freq);
nn.offer(new Feature(freq, index));
}
}
return nn;
}
The Feature
class is a simple implementation of Comparator
:
public class Feature implements Comparator<Feature> {
private Integer freq;
private Integer word;
public Feature() {}
public Feature(Integer freq, Integer word) {
this.freq = freq;
this.word = word;}
public int compare(Feature o1, Feature o2) {
if ((o1).getFrequency() > (o2).getFrequency())
return -1;
else if ((o1).getFrequency() < (o2).getFrequency())
return 1;
else
return 0;}
public double getFrequency() {
return freq;}
}
To summarize the problem, I have a collection which has members pairs of values, the first representing the bin and the second the frequency. This collection is updated until all features have been processed and at which point I just want to keep the bins with the greatest values.
I am using a HashMap<Integer, Integer>
structure for the collection and a BoundedPriorityQueue<Feature>
for the top k
bins.