0

I need to define and create indices for a postgresql DB used for translation memory.

This is related to this (Database design question regarding performance) question I've posted and the oversimplified design follows this (How to design a database for translation dictionary?) answer. The only difference being I have a Segment (basically a sentence instead of a word).

Tables:

I. languages

ID   NAME  
---------------
1    English  
2    Slovenian  

II. segments

ID   CONTENT        LANGUAGE_ID  
-------------------------------
1    Hello World     1  
2    Zdravo, svet    2  

III. translation_records (TranslationRecord has more columns, omitted here, like domain, user etc.)

ID   SOURCE_SEGMENT_ID    TARGET_SEGMENT_ID  
--------------------------------------
1    1                   2  

I want to index the segments table for when searching existing translations and for when searching combination of words in the DB.

My question is, is it enough to create an index for the segments table for the CONTENT column or should I also tokenize the CONTENT column to a new column TOKENS and index that as well?

Also, am I missing something else that might be important for creating such indices?

---EDIT--- Querying examples:

  1. When a user enters a new text to translate, the app returns predefined number of existing translation records where source segment's content matches by a certain percent with the entered text.

  2. When a user triggers a manual query to list a predefined number of existing translation records where source segment's content includes the words marked by the user (i.e. the concordance search).

Since there is only one table for all language combinations the first condition for querying would be the language_combination (attribute of translation_record).

---EDIT---

Sebastjan Hribar
  • 396
  • 3
  • 13
  • Indexes are created to support _queries_ - it all depends on how you want to access and join the tables in your queries. The column `ID` in `translation_records` is unnecessary. You only need source and target IDs and defined the primary key over both columns. –  Mar 27 '20 at 08:29
  • Thank you for editing. I've added the two examples of queries. – Sebastjan Hribar Mar 27 '20 at 14:31

0 Answers0