Given a block of lower-case text, how do you identify acronyms using a tool like Spacy, or something similar? I'm trying to intelligently capitalize words if they're proper-nouns, and I'm having trouble identifying acronyms.
Spacy's POS tagger works reasonably well for identifying proper nouns, including most common acronyms, via its standard document object but I don't see any easy way to differentiate between a short name and an acronym in the tokens it returns.
For example:
import spacy
nlp = spacy.load('en_core_web_lg')
text = 'joe bought stock in ibm'
doc = nlp(text)
for i, token in enumerate(doc):
print(i, token.text, token.pos_)
prints out:
0 joe PROPN
1 bought VERB
2 stock NOUN
3 in ADP
4 ibm PROPN
So it correctly identified the two proper nouns. However, there doesn't seem to be anything in the tokens for 0 or 4 that identify one as a regular name whereas the other is an acronym.
I can't find anything in the docs to clarify. Is there any way in Spacy to detect an acronym? If not, are there any other reliable ways?