I have 30 news articles that I love and would like to find more of. I created embeddings using DistilBert and saved to Faiss and Milvus in dbs called ILoveTheseArticles (I'm trying both out). These feature vectors have the same dimensions and max characters. As I bring in more news I would like to vectorize each new article and find the nearest top article in the ILoveTheseArticles to get the distance. Based on that distance I would like to keep or discard the new article almost as a binary classifier where I don't need to constantly train a kernel every time I add new similar articles.
As a Cosine Similarity example (Figure1) if OA and OB exist in ILoveTheseArticles and I search with a new embedding OC I would get OB closest to OC at 0.86 and if the threshold for keeping is say 0.51 I would keep the 0C article as it was similar to an article that I love.
As an L2 example (Figure1) if A' and B' exist in ILoveTheseArticles and I search for C' with a threshold of say 10.5 I would reject C' as B' is closest to C' at 20.62.
Is it possible to infer similar news articles using this approach with embeddings and distance? I second guess this approach when I read confusing answers to a similar-ish question. Is Cosine Similarity or IP better then L2 or vice versa in this scenario?