2

I need help with deciding on a DB design approach. We're building a translation tool with Hanami (Ruby web framework) and thus ROM. We are facing the design decision of having one DB (Postgresql) table for translation records, where each record is for one source and one target language combination. However, source and target may be any language: EN-DE, FR-EN.

The other possibility would be DB table per language pair.

We currently have about 1.500.000 legacy records. We will not reach 2.000.000 soon, but still, we need to consider it.

We are inclined to the first option, but would it be feasible in terms of querying and performance? The main difference being, for option one matching languages must be queried first, and then the query for corresponding translation string is triggered.

Would there be a significant difference in performance for between both options?

Thank you

seba

Sebastjan Hribar
  • 396
  • 3
  • 13

2 Answers2

4

The first approach will be the most flexible since you will be able to add language combinations in future without schema changes. The second approach would mean you add a table for every language combination which would both be a maintenance nightmare and complex code to query multiple tables (which can also mean dynamic queries resulting in poor performance)

PostgreSQL should be able to handle 1500000 records like a breeze provided you have enough hardware and have done proper performance configurations. I have worked with PostgreSQL tables with 50 million rows and it performs well.

Shameel
  • 632
  • 5
  • 12
  • 1
    I agree, for PG 1,500,000 records aren't hard to deal with. Like any other measurable subject, you may want to run benchmarks to see which configuration works better. Another suggestion is to use EXPLAIN, in order to check if you need indexes, or if your query is performant in general. – Luca Guidi Sep 23 '19 at 14:43
  • 1
    You would naturally index the table, then performance is virtually independent of the table size. – Laurenz Albe Sep 23 '19 at 15:09
  • Thank you all. The table would definitely be indexed and we're currently looking at 6 to 8 language combinations. – Sebastjan Hribar Sep 23 '19 at 16:20
0

You can do normalizing your db schema and avoiding redundant data.

In many cases it is convenient to have several small tables instead of one huge table. But it depends on our system and there is not a single possible solution.

We must also use the indexes responsibly. We should not create indexes for each field or combination of fields, since, although we do not have to travel the entire table, we are using disk space and adding overhead to write operations.

Another very useful tool is the management of connection pool. If we have a system with a lot of load, we can use this to avoid saturating the connections in the database and to be able to reuse them.

Rakesh
  • 793
  • 4
  • 22