28

I have a question that, I know, has been widely discussed about, but in my opinion, there is one aspect that still needs clarification.

I am creating a web-application with a multilanguage database, I already found some good-practices articles (such as this) and answers here in stack overflow like this.

So I decided to use a main table with the IDs of my items and another table with the translation for each item, let's say, for example

Content
ContentTranslation

or

Category
CategoryTranslation

and so on.

Right now what I'm doing? I just get the items from the database with all the translations and then I iterate over each one to look for the correct translation based on the current user's local, and if I find the correct local I set into the main object that translation for the page to render, otherwise I just get the translation that is flagged as the "default" one.

With large amounts of objects and translations, though, server response time might grow and even if the user might not notice, I don't want this.

So, is there any good practice for this use case too? For example some specific queries that say "pick the translation with locale "it" but if you don't find it just get the one with the "default" flag set?

Now for the technology I'm using Spring MVC with Hibernate and JPA (by means of JPARepository).

My objects all extend a basic Translatable class that I made this way

@MappedSuperclass
public abstract class Translatable<T extends Translation> extends BaseDTO {

    private static final long serialVersionUID = 562001309781752460L;

    private String title;

    @OneToMany(fetch=FetchType.EAGER, orphanRemoval=true, cascade=CascadeType.ALL)
    private Set<T> translations = new HashSet<T>();

    @Transient private T currentLocale;

    public void addLocale(T translation, boolean edit) {
        if (!edit)
            getTranslations().add(translation);
    }

    public void remLocale(String locale) {
        T tr = null;
        for (T candidate: getTranslations()) {
            if (candidate.getLocale().equals(locale))
                tr = candidate;
        }

        getTranslations().remove(tr);
    }

    public T getLocaleFromString(String locale) {
        if (locale == null)
            return null;
        for (T trans: translations) {
            if (trans.getLocale().equals(locale))
                return trans;
        }
        return null;
    }

    public T getDefaultLocale() {
        for (T tr: translations) {
            if (tr.isDefaultLocale())
                return tr;
        }
        return null;
    }

    public Set<T> getTranslations() {
        return translations;
    }

    public void setTranslations(Set<T> translations) {
        this.translations = translations;
    }

    public T getCurrentLocale() {
        return currentLocale;
    }

    public void setCurrentLocale(T currentLocale) {
        this.currentLocale = currentLocale;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }
}

So in my controller I iterate over the translations, find the one with the right locale and populate the "currentLocale" property, in my page I just take that and the user gets the correct language as intended.

I hope I've been clear and not messy, but if you need more informations I'll be glad to tell you more.

Andrew Marshall
  • 155
  • 1
  • 6
Luca
  • 1,116
  • 2
  • 15
  • 24
  • wouldnt that be easier to just use different database for different locale? – hi_my_name_is Nov 06 '14 at 07:40
  • I don't think that would be a good solution, since all the data must be available at the same time, so I would have to query multiple databases and if the administrator chooses to add another language to the application he'd also have to create another database... – Luca Nov 06 '14 at 09:37

2 Answers2

66

Some notes upfront:

  • my answer is more of an addition to my answer to this question, where you added a comment which then led to this question
  • in my answer I'm using C# and MS SQL Server (and I'll leave out any OR-mapping specific code)

In my applications, I use two different approaches for loading multilingual data, depending on the use case:

Administration / CRUD

In the case where the user is entering data or editing existing data (e.g. a product with its translations) I'm using the same approach as you have shown above in your question, e.g:

public class Product
{
    public int ID {get; set;}
    public string SKU {get; set;}
    public IList<ProductTranslation> Translations {get; set;}
}
public class ProductTranslation
{
    public string Language {get; set;}
    public bool IsDefaultLanguage {get; set;}
    public string Title {get; set;}
    public string Description {get; set;}
}

I.e. I'll let the OR-mapper load the product instance(s) with all their translations attached. I then iterate through the translations and pick the ones needed.

Front-end / read-only

In this case, which is mainly front-end code, where I usually just display information to the user (preferably in the user's language), I'm using a different approach:

First of all, I'm using a different data model which doesn't support/know the notion of multiple translations. Instead it is just the representation of a product in the "best" language for the current user:

public class Product
{
    public int ID {get; set;}
    public string SKU {get; set;}

    // language-specific properties
    public string Title {get; set;}
    public string Description {get; set;}
}

To load this data, I'm using different queries (or stored procedures). E.g. to load a product with ID @Id in the language @Language, I'd use the following query:

SELECT
    p.ID,
    p.SKU,
    -- get title, description from the requested translation,
    -- or fall back to the default if not found:
    ISNULL(tr.Title, def.Title) Title,
    ISNULL(tr.Description, def.Description) Description
  FROM Products p
  -- join requested translation, if available:
  LEFT OUTER JOIN ProductTranslations tr
    ON p.ID = tr.ProductId AND tr.Language = @Language
  -- join default language of the product:
  LEFT OUTER JOIN ProductTranslations def
    ON p.ID = def.ProductId AND def.IsDefaultLanguage = 1
  WHERE p.ID = @Id

This returns the product's title and description in the requested language if a translation for that language exists. If no translation exists, the title and description from the default language will be returned.

Community
  • 1
  • 1
M4N
  • 94,805
  • 45
  • 217
  • 260
  • 3
    Fantastic! I'll have to dig into Java and other projects APIs to find how to do this, but since your answer is really complete and you provide actual code & queries I'll mark this question as answered! Thanks a lot M4N! – Luca Dec 15 '14 at 21:22
  • But we still can get `NULL` in `Title` or `Description`. What if to create `NOT NULL` columns `DefaultTitle` and `DefaultDescription` in `Products` table? So, in query: `ISNULL(tr.Title, p.DefaultTitle) Title` – Ruslan K. Aug 18 '16 at 13:34
7

Using common shared table for all translatable fields of all tables

In the above approach the translation table is an extension of the parent table. Hence ProductTranslation has all the translatable fields of Product. It is a neat and quick approach and nice one as well.

But there is one disadvantage (not sure if it can be called disadvantage). If many more tables require translate-able fields, that many new tables are required. From my experience we took a different approach. We created a generic table for translation and a link table to link translations to the translate-able fields of the parent table.

So I'm going to use the previous example of Product which has two fields title and description that are translate-able to explain our approach. Also consider another table ProductCategory with fields name and description that also require translations.

Product
(
   ID: Integer
   SKU: String
   titleID: Integer // ID of LocalizableText record corresponding title
   descriptionID: Integer // ID of LocalizableText record corresponding description
)

ProductCategory
(
   ID: Integer
   nameID: Integer // ID of LocalizableText record corresponding name
   descriptionID: Integer // ID of LocalizableText record corresponding description
)

LocalizableText // This is nothing but a link table
{
    ID: Integer
}

Translations //This is where all translations are stored.
{
    ID: Integer
    localizableTextID: Integer
    language: String
    text: String
}

To load this data, I'm using different queries (modified the above). E.g. to load a product with ID @Id in the language @Language, I'd use the following query

SELECT
    p.ID,
    p.SKU,
    -- get title, description from the requested translation,
    -- or fall back to the default if not found:
    Title.text Title,
    description.text Description
  FROM Products p
  -- join requested translation for title, if available:
  LEFT OUTER JOIN Translations title
    ON p.titleID = title.localizableTextID
       AND title.Language = @Language
  -- join requested translation for description, if available:
  LEFT OUTER JOIN Translations description
    ON p.descriptionID = description.localizableTextID
       AND description.Language = @Language
  WHERE p.ID = @Id

This query is based on the assumption that individual fields of Product does not have a default translation

Similar query can be used to fetch records from ProductCategory

  • Yes, I considered having a generic table for all translations too, though as you say having multiple tables means having data scattered through the database. I don't mind this at all though, on the contrary I think it's a "good practice" to separate things that do not belong to each other. Anyway thanks for your answer too, I see that this kind of approach is really used and worth a try. I still have to find time to study it thoroughly, unfortunately – Luca May 06 '15 at 16:16
  • I think this approach is the most beautiful one from how it looks, but practically huge load will be on just one table, it may lead to a bottleneck performance issue in the system. – Yousef Jan 08 '19 at 06:51
  • 4
    I'm not really a database expert so my opinion might not be relevant, but aren't there two problems with this approach: 1) As I understand, left joins are always slower than inner joins and should be reduced. This approach will always require multiple joins dependent on the number of translated fields. With the other approach (multiple translation tables) there is only a single left join required. 2) Referential integrity is not guaranteed on the database level: a delete from Products will not automatically delete the corresponding entry in Translations, which leads to unused data garbage. – Dave Mar 06 '19 at 10:10
  • 1
    Underrated answer, imo. Really handy for creating translation files since you can dump from one table as opposed to a solution that has dozens of database tables. It also provides a lot more ability to automate conversion of platforms that weren't multi-language into multi-language. – Lawrence Johnson Oct 09 '21 at 20:39