To extract more information from annual reports (10ks), I am trying to compare companies based on the cosine similarity. One of the steps in this research is the stemming or lemmatization of words. The reason for doing this is to get the root of the words, so that when you don't have different variation words that at their core mean the same thing. For stemmer and lemmatizer, I used SnowBall stemmer and WordNetLemmatizer from the NLTK package.
E.g. of stemming: ; E.g. of lemmatization
walking -> walk walking-> walking
walked -> walk walked -> walked
or
owing -> owe owing -> owing
owed -> owe owed -> owed
The question is the following: should I use the stemmer or a lemmatizer for financial text?
The way I see it, a stemmer would be more appropiate for this kind of research.
Disclaimer: I know there is already a question discussing stemming vs lemmatization on stackoverflow. However, I am looking for some clarification regarding financial text in particular not as a general case.