Basically, I have a collection std::vector<std::pair<std::vector<float>, unsigned int>>
which contains pairs of templates std::vector<float>
of size 512 (2048 bytes
) and their corresponding identifier unsigned int
.
I am writing a function in which I am provided with a template and I need to return the identifier of the most similar template in the collection. I am using dot product to compute the similarity.
My naive implementation looks as follows:
// Should return false if no match is found (ie. similarity is 0 for all templates in collection)
bool identify(const float* data, unsigned int length, unsigned int& label, float& similarity) {
bool found = false;
similarity = 0.f;
for (size_t i = 0; i < collection.size(); ++i) {
const float* candidateTemplate = collection[i].first.data();
float consinSimilarity = getSimilarity(data, candidateTemplate, length); // computes cosin sim between two vectors, implementation depends on architecture.
if (consinSimilarity > similarity) {
found = true;
similarity = consinSimilarity;
label = collection[i].second;
}
}
return found;
}
How can I speed this up using parallelization. My collection can contain potentially millions of templates. I have read that you can add #pragma omp parallel for reduction
but I am not entirely sure how to use it (and if this is even the best option).
Also note: For my dot product implementation, if the base architecture supports AVX & FMA, I am using this implementation. Will this affect performance when we parallelize since there are only a limited number of SIMD registers?