Since you have labels for the 100 classes, this is in principle a fairly standard outlier detection problem, and you need to find documents that do not resemble most of the documents that carry the same label.
As you suggest, you can use cosine similarity (on word counts, I assume) to score the similarity of pairs of documents. There are many practical issues involved with cosine similarity such as selection of important words, stemming, stop words and so on, and you may also wish to consider word similarity, via soft cosine similarity.
It would be impractical to calculate all cosine similarities for such a large corpus so you will need to summarise each class somehow. A simple method would be to average the word word counts for each document type and measure the similarity between this model document and each of the members in the class, so to score each document you only need to calculate a single cosine similarity. You should reject some chosen percentile of documents as potentially misclassified, with a threshold comparable to the percentage of misclassified documents you expect. Obviously a higher threshold will eliminate more errors, but also more correctly classified documents.
A better implementation might be to apply a fast clustering algorithm separately to each of the 100 kinds of documents. The average word count within each cluster would give you a handful of model documents for each label, and you should use the highest similarity as the score for each document.