I am doing a little research on text mining and data mining. I need more help in understanding cosine similarity. I have read about it and notice that all of the given examples on the internet is using tf-idf before computing it through cosine-similarity.
My question
Is it possible to calculate cosine similarity just by using highest frequency distribution from a text file which will be the dataset. Most of the videos and tutorials that i go through has tf-idf ran before inputting it's data into cosine similarity, if no, what other types of equation/algorithm that can be input into cosine similarity?
2.Why is normalization used with tf-idf to compute cosine similarity? (can i do it without normalization?) Cosine similarity are computed from normalization of tf-idf output. Why is normalization needed?
3.What cosine similarity actually does to the weights of tf-idf?