I want to make a feature matching algorithm which works by comparing detected object embedding with embeddings inside a database then search for the lowest distance. an embedding consist of 16 float32 values.
Currently there are 500 thousand embeddings in a database which all need to be compared with an input embedding each time a particular object detected (ex : car)
My current method is to load all database data to an Array in python and do iteration for each embedding to calculate lowest eucledian distance. It take tremendously long time if using loop (up to 1 second). I need about 30ms
Here is my code :
from sqlitedict import SqliteDict
import numpy as np
dictCarDb = SqliteDict('./carDatabase.db', autocommit=True)
"""
dictCarDb Dictionary Format :
dictCarDb = {
"Car1" : [embedding1, embedding2, embedding3, ...],
"Car2" : [embedding1, embedding2, embedding3, ...],
"Car3" : [embedding1, embedding2, embedding3, ...],
.....................
}
CONVERTED_DATA List Format :
CONVERTED_DATA = [
["Car1", [embedding1, embedding2, embedding3, ...]],
["Car2", [embedding1, embedding2, embedding3, ...]],
["Car3", [embedding1, embedding2, embedding3, ...]],
.....................
]
embeddingInput List Format :
embeddingInput = [embedding1, embedding2, embedding3, ...]
"""
CONVERTED_DATA = []
for carName in dictCarDb:
embedding = dictCarDb[carName]
CONVERTED_DATA.append([carName, embedding])
def calculateDistance(embedding1, embedding2):
dist = np.linalg.norm(embedding1 - embedding2)
return dist
def searchCarFromArray(embeddingInput):
smallestVal = ["Uknown", 1000]
for carNameDb, carEmbeddingDb in CONVERTED_DATA:
euclidVal = calculateDistance(carEmbeddingDb, embeddingInput)
if euclidVal < smallestVal[1]:
smallestVal = [carNameDb, euclidVal]
if smallestVal[1] < 0.5:
return smallestVal[0]
return "Unknown"
=================== UPDATE (22/2/2022) ===================
I've tried suggestions from @DominiqueGarmier and @WillemHendriks below and find out that faiss (https://github.com/facebookresearch/faiss) is the solution.
below code can execute my goal in less than 30ms (it was 1 second if using python "for" loop) :
import faiss
.......
.......
def searchCarFromArray(embeddingInput):
dimensionEmb = 16
CONVERTED_DATA_NP = np.array(CONVERTED_DATA, dtype=object)
embeddingsdB = CONVERTED_DATA_NP[:, 1].tolist()
embeddingsdB_np = np.array(embeddingsdB, dtype=np.float32)
embeddingInput_np = np.array(embeddingInput, dtype=np.float32)
embeddingInput_np = embeddingInput_np.reshape(1, -1)
index = faiss.IndexFlatL2(dimensionEmb)
index.add(embeddingsdB_np)
D, I = index.search(embeddingInput_np, 1)
indexArr = I[0][0]
carName = CONVERTED_DATA_NP[indexArr][0]
carEmb = CONVERTED_DATA_NP[indexArr][1]
euclidVal = D[0][0]
if euclidVal < 0.5:
return carName
return "Unknown"