5

I have database with words and phrases from for exp. English to 15 other languages, and also for every language in that list to other 15. For one pair they are sort for now in one table like this (en -> de):

  • id_pair
  • word_en
  • word_de

What is the best way to create database for that huge list of words and phrases? I know that I must separate every primary language from others, and was thinking maybe like this:

ENGLISH
ID | WORD
1  | 'dictionary'

GERMAN
ID | WORD
1  | 'lexikon'
2  | 'wörterbuch'

TRANSLATION_EN_DE
ID_EN | ID_DE
1     | 1
1     | 2

Is this the best way to normalize DB? But what is with phrases, I need also if someone enter word "dictionay" that this returns also "This dictionary is good" and translation for that. (I know this can find in first table with sql query, is that best way?)

Also need it alphabetically all time, I will have lot of new entry daily, so I can print couple words before and after the word/phases someone looking for translate.

I'm stuck and cant decide what is the best way to optimize it. These db have all together more than 15gb just text based translation, and around 100k daily req, so every ms worth. :) Any help will be appreciate, thx!

Ivan Loguz
  • 91
  • 1
  • 7

1 Answers1

7

With separate table for each language, you'd need a large number of junction tables to cover all the possible translation combinations. On top of that, adding a new language would require adding more tables, rewriting the queries, client code etc.

It's better to do it in a more generalized way, similar to this:

enter image description here

Regarding the TRANSLATION table, I propose to also create a CHECK (WORD_ID1 < WORD_ID2) and create an index {WORD_ID2, WORD_ID1} (the opposite "direction" from the PK), and represent the both directions of the translation with only one row.

Consider clustering the TRANSLATION table if your DBMS supports that.

Also need it alphabetically all time

The query...

SELECT * FROM WORD WHERE LANGUAGE_ID = :lid ORDER BY WORD_TEXT

...can use the index underneath the UNIQUE constraint {LANGUAGE_ID, WORD_TEXT}.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • Thanks for your help Branko, I was think with separate tables for each language I will save time and server performance for searching translations. Could there be a problem now while all words are in one table and it will be a large for searching specific word on one lang? I must ask because I have a lot off req daily and its growing. – Ivan Loguz Jun 04 '13 at 14:59
  • @IvanZg Assuming all tables are on the same physical disk, then no - searching for the right table in the "multiple table" scenario also has its cost and the end result, performance-wise, should be quite similar. If you intend to put tables on different disks, similar effect can be achieved by partitioning (assuming your DBMS supports it) instead of splitting to separate tables. – Branko Dimitrijevic Jun 04 '13 at 15:06
  • And what about with words inside phases for exp if I want to give sentence how to use word? Do I use still %WORD% for selecting it? – Ivan Loguz Jun 04 '13 at 15:12
  • @IvanZg I'm not completely clear on your requirements there - I simply added the DESCRIPTION field into WORD table, assuming the phrase(s) can be stored there. There are other possibilities - for example having another table, each row containing one phrase, and multiple phrase rows being potentially related to the same word... But if you need to search **inside** the phrase (and not just for WORD_TEXT), then you'll probably be better off with some kind of full-text index, instead of decomposing the data model any further... – Branko Dimitrijevic Jun 04 '13 at 15:44
  • Thx again, I was thinking use DESCRIPTION for meaning of WORD_TEXT on native lang. Phrases are also translated, so yes I need search inside the phrase. – Ivan Loguz Jun 04 '13 at 16:20