1

I have a table of products, and a separate table which lists various translations for those products. The basic setup looks something like this:

dbo.Products
ProductID | Description
1         | An amazing product
2         | An even more amazing product
3         | Don't buy this one

dbo.ProductTranslations
ProductID | Language | Description
1         | EN-US    | null
1         | FR-CA    | Un produit étonnant
2         | EN-US    | null
2         | FR-CA    | Un produit encore plus étonnant
3         | EN-US    | null
3         | FR-CA    | Ne pas acheter celui-ci

What I'd like to do is basically flatten that out so all the products and translations are in one table and condensed into a single row for each product like this:

dbo.FlattenedProducts
ProductID | EN-US_Description            | FR-CA_Description
1         | An amazing product           | Un produit étonnant
2         | An even more amazing product | Un produit encore plus étonnant
3         | Don't buy this one           | Ne pas acheter celui-ci

I've dug through examples like this one about pivoting and this one where the best suggestion (performance-wise) seems to be using a slew of case statements, but I'm struggling to apply them to my situation. Using case statements like in that second link lets me make the translations linear, but it doesn't reduce the total number of items. I'm working with 40 million rows (and ultimately a dozen or so joins) so speed is critical and using a select distinct from all this data just kills performance, I need to have the data already sanitized right out of the select statement.

Is there some trick to this I'm missing?

Edit: I should note that sometimes there are real values in the translated description when it's 'EN-US' (don't ask why, I didn't design this), and those need to override the original description so I can't just SELECT WHERE Language != 'EN-US' or similar. Also there are more languages than just English and French, I've just simplified it for demonstration purposes.

Community
  • 1
  • 1
thanby
  • 323
  • 1
  • 6
  • 22

2 Answers2

1

It seems like you need to return description for English from your products table. Use conditional sum aggregation:

SELECT
  p.ProductID,
  MAX(CASE WHEN pt.Language = 'EN-US' THEN COALESCE(pt.Description, p.Description) END) AS ENUS_Description,
  MAX(CASE WHEN pt.Language = 'FR-CA' THEN pt.Description END) AS FRCA_Description
FROM dbo.Products p
LEFT JOIN dbo.ProductTranslations pt ON p.ProoductID = pt.ProductID 
GROUP BY p.ProductID

Edit: Added COALESCE() as per your edit to give higher priority to English translation from translations table - if it's null, then take the original description from products table.

For more languages just add corresponding lines that will look like the one for FR-CA but with different language value.

Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • Out of curiosity, what's the purpose of the `GROUP BY` statement in there? If I leave it out I get an error but I'm not sure why it's necessary to begin with. – thanby Oct 05 '16 at 18:43
  • 1
    It will return one row for each group (here: ProductID). Then you just use aggregate functions like MAX() to specify that you want maximum value for the given group (there is only one for each language). – Kamil Gosciminski Oct 05 '16 at 21:52
1

The simplest and fastest is either conditional aggregation or a join:

select pt.productid,
       max(case when language = 'EN-US' then description end) as en_us,
       max(case when language = 'CA-FR' then description end) as ca_fr
from ProductTranslations pt
group by productid;

You can add into FlattenedProducts if you want to create a table.

The join method is:

select coalesce(pte.productid, ptf.productid) as productid,
       pte.description as us_en,
       ptf.description as ca_fr
from ProductTranslations pte full join
     ProductTranslations ptf
     on pte.productid = ptf.productid and
        pte.language = 'EN-US' and ptf.language = 'CA-FR';

This returns all products. If you know that each product has at least one of each translation, then use inner join rather than full join -- it is more efficient.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786