I need to define and create indices for a postgresql DB used for translation memory.
This is related to this (Database design question regarding performance) question I've posted and the oversimplified design follows this (How to design a database for translation dictionary?) answer. The only difference being I have a Segment (basically a sentence instead of a word).
Tables:
I. languages
ID NAME
---------------
1 English
2 Slovenian
II. segments
ID CONTENT LANGUAGE_ID
-------------------------------
1 Hello World 1
2 Zdravo, svet 2
III. translation_records
(TranslationRecord has more columns, omitted here, like domain, user etc.)
ID SOURCE_SEGMENT_ID TARGET_SEGMENT_ID
--------------------------------------
1 1 2
I want to index the segments table for when searching existing translations and for when searching combination of words in the DB.
My question is, is it enough to create an index for the segments table for the CONTENT column or should I also tokenize the CONTENT column to a new column TOKENS and index that as well?
Also, am I missing something else that might be important for creating such indices?
---EDIT--- Querying examples:
When a user enters a new text to translate, the app returns predefined number of existing translation records where source segment's content matches by a certain percent with the entered text.
When a user triggers a manual query to list a predefined number of existing translation records where source segment's content includes the words marked by the user (i.e. the concordance search).
Since there is only one table for all language combinations the first condition for querying would be the language_combination
(attribute of translation_record
).
---EDIT---