Is there any free database which stores keywords with other relevant keywords, for applications to determine semantic relevance?

Question

This looks like a search for a valuable asset, but since we have a free alternative for many things, I am optimistic about this one.

A database which stores two key-value pairs like

key-value

or

key-context-value

would be very useful for web developers who collect data and want to tag them or searching records which can be relevant.

A data table like this would even be the normalized form of what they would want to store.

If you have ever heard of an available free to copy data table like this, please share. Thank you.

Attila · Answer 1 · 2012-06-15T20:16:49.097

You could use WordNet: it contains general relationships between (English) words (divided into noun, verb, adjective and adverb). The relationships are among synsets (synonym sets) and describe such relations as "bus" is-a "vehicle", "wheel" is-part-of "car".

Note: To look up words in the WordNet dictionary you need to use lemmas (the base form of the word), so if you want to look words up from a free text (such as a website), you will have to calculate the lemmas of the words first. You could do this by applying some Natural Language Processing (NLP) techniques, or creating your own heuristics.

Besides the synset relationships, WordNet also contains short defintions (gloss) of the synsets, which you could use to gain more context. Also, Sense Disambiguation techniques can help you decide which sense of a multi-sense word to use, which is also a form of providing context.

If you need more context than what WordNet provides (general relationships between general meanings of English words), you should find a suitable ontology that describes semantic relationships between concepts. You will have to map the text to the concepts it is about (again, NLP techniques can help in this)

Example ontologies: SUMO, MSO, etc.

score 1 · Answer 2 · answered Jun 22 '12 at 02:39

You could use Lucene (or any text-search engine) to store your documents, combined with a custom stemmer to index your document text based on meaning (rather than word variations).

Normally, stemmers are used to convert all variations of a word to the base word stem. For example, although the document is stored and retrieved with text as-is, any of the words "sing, singing, sang, sung" would be indexed as "sing", so when a search is made using the search term "sing", you get a hit on all documents containing sing, singing, sang or sung.

Similarly, the search terms may also be stemmed, so searching for any of "sing, singing, sang or sung" will search as if "sing" is the search term.

Standard stemmers deal with the usual English variations of words, but you could create one that stems based on meaning. For example, you might create a stemmer that stems any of "problem, issue or complaint" to "problem", etc for all words you want to "link".

The advantage of using a stemmer is all the search-related heavy lifting is done for you by the text search engine (and besides, text search engines are incredibly fast!).

Wen it come to implementation, you could make the linkages data-driven, either generating the code for the stemmer based on data in a database, or make it dynamic and look up a database whenever a search/index operation is done, or somewhere in between - caching the values and refreshing them periodically.

score 0 · Answer 3 · answered Jun 22 '12 at 09:50

Depending from your requirements, you can look for different implementations of map-reduce paradigm. The most famous one is Hadoop, specifically Hadoop MapReduce. Though this is a framework rather than a database, it does exactly what you ask - storing and processing data in the key=value pair manner. This is a product for building large, scalable systems. If you need something more simple, there exist some smaller implementations, such as PHP-based (on top of MySQL), and even a "simple" MySQL aggregation, which can mimic MapReduce in most cases, where you do not need distributed system with loads of data.

score 0 · Answer 4 · edited May 23 '17 at 11:44

It sounds very much like you are talking about an ontology. See What is an Ontology (Database?)?

It seems to me that ontologies provide a very powerful way of building up complex models of real-world entities and relationships in a natural and organic way. Relationships between entities/concepts can be captured in the model, and as the number of types of relationship grows, more and more sophisticated rules can be encoded to exploit this body of knowledge.

score 0 · Answer 5 · answered Jun 22 '12 at 15:41

0

the format sounds like JSON objects => so i looked at wikipedia and found CouchDB - an open source database that uses JSON to store data

answered Jun 22 '12 at 15:41

Aprillion

21,510
5
55
89

Is there any free database which stores keywords with other relevant keywords, for applications to determine semantic relevance?

5 Answers5