1

I'm creating a language app that currently only features Mandarin Chinese and Spanish.

Currently, I have self-created dictionary simply loaded as JSON without storing in the DB, but I've found full downloadable dictionaries, such as CEDICT for Chinese to do the definitions for me. That being said, this file is 115k rows long, with 6 columns per row.

I also need to do this for Spanish, and then every other language I plan on including.

Notes:

  • MySQL DB
  • Laravel ORM (PHP)

That being said, what's the best way to store this data?

I'm assuming as separate tables, dictionary_zh, dictionary_es, but I could also store each dictionary in a dictionary table, with an added column for locale and query based on that. This SO answer states that 1m records isn't "too much" for a table to handle, it simply defines on how you index the table.


Btw, anyone have a recommendation for a good downloadable Spanish - English dictionary?


Note: I'm downloading the dictionary and cutting it up into something I can load into a CSV

Traditional Simplified  Pinyin  Meaning       Level Quest
佟               佟       Tong2   surname Tong    1     2
...

I'm translating it by simply passing in the identifying character, in this case, and grabbing its Meaning.

user3871
  • 12,432
  • 33
  • 128
  • 268
  • keep your dictionaries separate, you'll have to maintain changes-only updates later as they update the data. have an interface programmed the same way for all dictionaries, something like `$dictionaryObj->translate( $term )`. This is so that you will have a simple way to translate. If you tell us more about how the translation works, I can try and give you more of my feedback. –  Jul 04 '17 at 22:40
  • @VladimirGhetau see above – user3871 Jul 04 '17 at 22:44

1 Answers1

1

I would store each dictionary in a separate table to abstract how I fetch the definition for a word depending on the locale, without the need to know how a dictionary (mapped as Dictionary type in the diagram below) operates its translation. This is useful when you might have dictionaries which don't reside in your DB, such as ones translating via an API.

UML

The method translate() is implemented differently for each type of Dictionary (in your case ChineseDictionary or SpanishDictionary).

Another advantage of this approach from a data management point of view is that you will not have to make a lot of operations on the data when new versions of your dictionary are released, which makes it cheap to maintain.