I'll try and answer the question, although strictly speaking, you should probably re-structure your database and relationships to a more suitable schema, incorporating the relationships within tables, rather than using the table name. For "base nouns", there are several things you can try:
To split up the words, I'll refer you to this answer. However, given that you have an algorithm that is having trouble with "Product" vs "Products", I will assume you can already isolated "Product" from "DimProduct".
To normalize, you can Porter stem each word, and then find the common words between related tables:
from nltk.stem import *
stemmer = PorterStemmer()
stemmer.stem('Product')
# 'product'
stemmer.stem('Products')
# 'product'
- If Porter stemming is too destructive, or presents duplication, then you can use the
inflect
library to first check if each word is plural, and then singularize if it is:
import inflect
p = inflect.engine()
p.singular_noun('Product')
# False
p.singular_noun('Products')
# 'Product'
- Another, more intensive way to find the "base form" of a word would be to use spaCy's Lemmatizer:
import spacy
nlp = spacy.load("en_core_web_sm")
nlp('Product')[0].lemma_
# 'product'
nlp('Products')[0].lemma_
# 'product'
Spacy is one of my favourite NLP tools, and you can use it to do a plethora of things, here are some basic examples to start playing around with. If you are more interested in word similarity, then (for example) Wordnet, Gensim, and Spacy are good starting points, although there are many, many others out there. I hope this helps you a little bit!