Schema for a multilingual database

Question

I have seen a couple of questions around this problem (e.g. here and here) but just wanted to clarify an alternative approach

What we have ATM

Product

Id
Name

And now we need to add support for few languages ( this is a SaaS solution and a given customer might have couple of languages max 3-4. And each SaaS tenant have their own database)

In all the other proposed solutions, it involves modifying the schema ( in one way or the other). And worse, the search becomes much more complicated. Instead, what I was thinking is storing all translatable values in the same column (in this case : Name). So a typical row which has content for two languages would look like below

Id | Name

11 | en : Banana ∆ es: Plátano ∆ zh : 香蕉

∆ - is an example splitter ( this will be a proper special character which will not conflict with user inputs )

This way the search is much more simplified and the amount of change I have do on the schema is minimal. Everything else ( update/retrieval based on selected users locale can be managed via the code(C#) )

So my question is, is this an acceptable approach ? If not, what problems you see in this ?

Possibly of interest: http://dba.stackexchange.com/questions/10005/database-behind-a-multilingual-user-interface — Mike Nakis, Dec 09 '15 at 23:55

Mike Nakis · Answer 1 · 2015-12-10T00:16:52.900

1

The major problem with this approach is that in order to search for a product by name, you are going to have to use LIKE '%(productName)%' which will do a full table scan. Full table scans give extremely bad performance. This will not just perform badly, this is the stuff Denial-Of-Service attacks are made of.

You might be able to get better than abysmally bad performance by utilizing a full text index. I do not have enough experience with full-text indexes so as to advise on those, but my educated guess is that you are likely to run into trouble with collation, since you have different languages on the same column.

edited Dec 10 '15 at 00:16

answered Dec 09 '15 at 23:58

Mike Nakis

56,297
11
110
142

Mike, thanks for the quick response. I understand the problem around collation. Let's assume the performance is not a huge problem here and in most cases a tenant would endup having one language and there aren't that many records in a table either. But about the full table scan problem, isn't that how it would happen anyways on a like query. Wouldn't a non-clustered index on productName improve the performance there ? – Illuminati Dec 10 '15 at 00:09
btw, what is the point @PeterM mentioned that I can't seem to see here. – Illuminati Dec 10 '15 at 00:12
No it would not! It would only improve the performance for the language which appears first in the string. – Mike Nakis Dec 10 '15 at 00:13
PeterM had a comment here which he deleted. It was to the effect that you will not have ordering, either. (Which is another way of saying that nothing past the first string can be indexed.) – Mike Nakis Dec 10 '15 at 00:14

Schema for a multilingual database

1 Answers1