0

I have a project about database tables. I have to figure out which table is related to the others. For example; In my database, I have two tables. One of them's name is 'product' and another one's name is DimProduct. These are related to each other. I used an algorithm. It detects containment relationships.

Here comes my problem. I have a table name 'products' because of 's' the algorithm is not working. I need another way to find these tables' base nouns. What should I do? Are there any NLP library in python or any function to find noun?

3 Different Table Names About Products

samitugal
  • 5
  • 4
  • 1
    Surely you'd want to do this by looking at the foreign keys of the database table, right? Is NLP really the easiest path here? – Nick ODell Oct 14 '22 at 17:44
  • Actually, I thought about it but I want to find the relation between different databases' tables. As an example, I have a database called Market Which has a productDim table, and I have another database called BikeStore. It also has a product table. These tables' context are the same but they are not sharing a foreign key. – samitugal Oct 14 '22 at 18:38
  • You could try LCS, longest common substring, which is a pretty simple algorithm for checking the similarity of two strings. – Nick ODell Oct 14 '22 at 18:42
  • But how can I detect whether these common substrings are meaningful? – samitugal Oct 15 '22 at 09:54
  • By length of substring relative to length of the two original strings. – Nick ODell Oct 15 '22 at 16:47

1 Answers1

0

I'll try and answer the question, although strictly speaking, you should probably re-structure your database and relationships to a more suitable schema, incorporating the relationships within tables, rather than using the table name. For "base nouns", there are several things you can try:

  • To split up the words, I'll refer you to this answer. However, given that you have an algorithm that is having trouble with "Product" vs "Products", I will assume you can already isolated "Product" from "DimProduct".

  • To normalize, you can Porter stem each word, and then find the common words between related tables:

    from nltk.stem import *
    stemmer = PorterStemmer()
    stemmer.stem('Product')
    # 'product'
    stemmer.stem('Products')
    # 'product'
  • If Porter stemming is too destructive, or presents duplication, then you can use the inflect library to first check if each word is plural, and then singularize if it is:
    import inflect
    p = inflect.engine()
    p.singular_noun('Product')
    # False
    p.singular_noun('Products')
    # 'Product'
  • Another, more intensive way to find the "base form" of a word would be to use spaCy's Lemmatizer:
    import spacy
    nlp = spacy.load("en_core_web_sm")
    nlp('Product')[0].lemma_
    # 'product'
    nlp('Products')[0].lemma_
    # 'product'

Spacy is one of my favourite NLP tools, and you can use it to do a plethora of things, here are some basic examples to start playing around with. If you are more interested in word similarity, then (for example) Wordnet, Gensim, and Spacy are good starting points, although there are many, many others out there. I hope this helps you a little bit!

Eugene
  • 1,539
  • 12
  • 20