I'm trying to do sentiment analysis on financial news, and I want to be able to recognise companies based on the ticker symbol. Eg. recognise Spotify from SPOT. The final objective would be to generate sentiment models of each company. spaCy is pretty good at named entity recognition out of the box but it falls short when comparing ticker symbol and company. I have a list of ticker symbol and company names (from NASDAQ, NYSE, AMEX) in csv format.
Based on using the similarity() function in spaCy, the results aren't good so far. The table below shows a sample of a few companies which have a low similarity score, even though the names are similiar visually. I want to train the model using the list of company names/ticker symbols, and have a higher similarity score after this training process.
+------------+-------------------------+------------+
| Stock | Name | Similarity |
+------------+-------------------------+------------+
| CSPI stock | CSP Inc. | 0.072 |
| CHGG stock | Chegg, Inc. | 0.071 |
| QADA stock | QAD Inc. | 0.065 |
| SPOT stock | Spotify Technology S.A. | 0.064 |
+------------+-------------------------+------------+
Based on spaCy's documentation, some tools include using PhraseMatcher, EntityRuler, Rule-based matching, Token Matcher. Which one would be most suited for this use case?