1

I am implementing a translation dictionary with a structure based on this post: https://stackoverflow.com/a/16919801/4527140

The server DB is MySQL, and the client is SQLite (on Android), but for this particular issue, only the server (MySQL) comes into play.

Table structure:

CREATE TABLE word (
  _id INT NOT NULL AUTO_INCREMENT,
  language_id INT NOT NULL,
  word VARCHAR(255) NOT NULL,
  update_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (_id) ,
  UNIQUE INDEX ui_word_lang_idx (language_id ASC, word ASC) ,
  CONSTRAINT fk_word_language
    FOREIGN KEY (language_id)
    REFERENCES language (_id))

CREATE TABLE IF NOT EXISTS translation (
  _id INT NOT NULL AUTO_INCREMENT,
  word_id1 INT NOT NULL,
  word_id2 INT NOT NULL,
  update_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (_id) ,
  UNIQUE INDEX ui_word1_word2_idx (word_id1 ASC, word_id2 ASC) ,
  UNIQUE INDEX ui_word2_word1_idx (word_id2 ASC, word_id1 ASC) ,
  CONSTRAINT fk_translation_word1
    FOREIGN KEY (word_id1)
    REFERENCES word (_id),
  CONSTRAINT fk_translation_word2
    FOREIGN KEY (word_id2)
    REFERENCES word (_id))

Data:

language
_id | language | update_time
----------------------------
1   | English  | 2000-01-01 01:00:00
2   | French   | 2000-01-01 01:00:00
3   | Spanish  | 2000-01-01 01:00:00
4   | Turkish  | 2000-01-01 01:00:00

word
_id | language_id | word    | update_time
-----------------------------------------
1   | 1           | LOVE    | 2000-01-01 01:00:00
2   | 1           | HATE    | 2000-01-01 01:00:00
3   | 2           | AMOUR   | 2000-01-01 01:00:00
4   | 2           | HAINE   | 2000-01-01 01:00:00
5   | 1           | LUST    | 2000-01-01 01:00:00
6   | 3           | LUJURIA | 2000-01-01 01:00:00

translation
_id | word_id1 | word_id2 | update_time
---------------------------------------
1   | 1        | 3        | 2000-01-01 01:00:00
2   | 2        | 4        | 2000-01-01 01:00:00
4   | 5        | 6        | 2000-01-01 01:00:00

The client app will only download from the server DB the words and translations for the languages they have installed, and will download periodic updates, based on a timestamp on the table. I am trying to figure out a SELECT query that will get the words for both the base language (in this case, English is always installed and has language_id =1) and the other installed languages, but only the words for which there is a translation mapping between the installed languages. Thus, I do not wish to select words for which there is no mapping in the installed set of languages.

With the above data, if I have only installed French, I would only select and download:

word
_id | language_id | word    | update_time
-----------------------------------------
1   | 1           | LOVE    | 2000-01-01 01:00:00
2   | 1           | HATE    | 2000-01-01 01:00:00
3   | 2           | AMOUR   | 2000-01-01 01:00:00
4   | 2           | HAINE   | 2000-01-01 01:00:00

translation
_id | word_id1 | word_id2 | update_time
---------------------------------------
1   | 1        | 3        | 2000-01-01 01:00:00
2   | 2        | 4        | 2000-01-01 01:00:00
Community
  • 1
  • 1
asorenson
  • 169
  • 3
  • 14
  • I dont understand your question. Could you show some data and require output. Also you should mention your RDBMS – Juan Carlos Oropeza Jul 29 '15 at 02:42
  • Without data I dont see the issue. You could create a small sample with data on [SQL FIDDLE](http://sqlfiddle.com/#!3/5c2bd/3). Please read [How To Ask](http://stackoverflow.com/help/how-to-ask) – Juan Carlos Oropeza Jul 29 '15 at 03:12
  • So if user have 5 lenguaje installed and a new word appear only in 1 of those `dont download` but if is in 2 or more `download` ? or is if in english and another one `download` ? but if new word is only italian and french `dont download` ? – Juan Carlos Oropeza Jul 29 '15 at 03:19
  • Only download the relevant words and translations for the languages installed. Thus, if only French is installed, only download words in French, and the English words that have a mapped translation to a French word. Working on putting together example data to illustrate this better. – asorenson Jul 29 '15 at 03:30
  • But if French and Spanish are installed, Only download French words mapped to English, and Spanish mapped to English? No Spanish <-> French translations? – Juan Carlos Oropeza Jul 29 '15 at 03:40
  • There are only English->Other translations, not Other1->Other2. – asorenson Jul 29 '15 at 03:43
  • Perhaps I just need to make sure English is not in the list of languages to select for? – asorenson Jul 29 '15 at 04:05

1 Answers1

2

Lets remove the date variable for now.

SQL FIDDLE DEMO

select distinct w1.word_id, w1.lenguaje_id, w1.word
from 
    word w1 inner join 
    translation t on (w1.word_id = t.word_id1 or w1.word_id = t.word_id2) inner join
    word w2 on ((t.word_id1 = w2.word_id or t.word_id2 = w2.word_id) 
                 and w2.word_id <> w1.word_id)
where 
    (w1.lenguaje_id = 1 or w2.lenguaje_id = 1)
and (w1.lenguaje_id in (2,3) or w2.lenguaje_id in (2,3))
  • First Join: The word is either on left or right side in translation.
  • Second Join: Same as above but also make sure isn't the same word.
  • First Where: Make sure any of the 2 word is english
    But if you only have english->other translation you dont need this one because is always true.
  • Second Where: Make sure the other word is an leguaje installed by user <> to english.
    Same as above if english->other you should design first id always english and simplify the OR

My sample:

  • English have 6 sins (not Sloth)
  • Spanish have 5 sins (have Sloth)
  • French have 4 sins (have Sloth)
  • Spanish French both have Wrath

In this case you dont get Spanish to French sloth.
If you change the select distinct to select * in the demo you could see a litle more what is happening.

This is only the word select. You will also need a select for translation. But that is a new requirement and I'm going to bed now, but should be easy once you have this one.

Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • I'll put together some sample data to illustrate, but the issue here, is that there will be a large number of words in the base language (English / language_id = 1), that aren't applicable to all other languages. For example, word1 may only have a translation into French, not Indonesian, Turkish, etc, so I don't wish to download all of this unnecessary data. – asorenson Jul 29 '15 at 02:59
  • What you mean unnecessary data. In your example. `LOVE` is in eglish `AMOUR` in french. You will download both and there isnt `CINTA` in indonesia or `aşk` in Turkish yet, those will download in next update. – Juan Carlos Oropeza Jul 29 '15 at 03:04
  • Aditionally, let say you have a new word only available in English and no translation. You should download anyway so when user search for that you can say the word exist but no translation on Turkish or Indonesian. – Juan Carlos Oropeza Jul 29 '15 at 03:06
  • The date is not the issue - in this case, I put in an old date, that initially retrieves all of the data, but I have tried it with a current timestamp, and it still retrieves records that are older than the update_time. So yes, my date here is confusing the matter - I'll change it. The bigger issue is the subquery – asorenson Jul 29 '15 at 03:09
  • But if the user only installs Turkish, and there is no translation for `aşk` then I don't want to download the English record for `LOVE` – asorenson Jul 29 '15 at 03:11
  • These examples are probably confusing also. The app will hold specific niche vocabulary, that often will not have translations in other languages. Think of cheese as an example. `Brie` would have an English-French pairing, but not an Italian one. On the other hand, Italian would have `Pecorino` but this wouldn't apply to French. Does that make sense? – asorenson Jul 29 '15 at 03:14
  • Well let me think, but as I say your english customer talk to a turkish. turkish try to search `aşk` will find there isnt translation available and that is also good information. – Juan Carlos Oropeza Jul 29 '15 at 03:17
  • And for the translations, I can just use the same query, but select t values instead of w1 values, right? – asorenson Jul 29 '15 at 04:47
  • Sound correct. Test modifying the fiddle. Btw next time please include your own fiddle that help a lot. – Juan Carlos Oropeza Jul 29 '15 at 04:49