2

I need to translate the special characters of a string column into 'raw' characters, using Amazon Presto. In SQL Server, I could do that using the TRANSLATE function:

select
city_name,
translate(city_name,'áéíóúÁÉÍÓÚäëïöüÄËÏÖÜâêîôûÂÊÎÔÛãẽĩõũÃẼĨÕŨ','aeiouAEIOUaeiouAEIOUaeiouAEIOUaeiouAEIOU') as cityname_raw
FROM cities

That would return exactly what I need:

city_name

Goiânia
São Paulo
Palhoça
Várzea Paulista

city_name_raw

Goiania
Sao Paulo
Palhoca
Varzea Paulista

Is there an alternative to the TRANSLATE() function in Amazon Presto?

Update:

As suggested, I used lots of nested replaces and it worked out:

replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(name,'ü','u'),'í','i'),'ê','e'),'ú','u'),'õ','o'),'ô','o'),'ç','c'),'é','e'),'ã','a'),'á','a'),'â','a'),'ó','o')

It isn't a pretty query but it works.

2 Answers2

2

Trino (f.k.a as PrestoSQL) offers this functionality since Trino 337.

see https://trino.io/docs/current/functions/string.html#translate

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
  • This an an appalling deficit. How would they forget to implement something like this, when such functionality exists in so many environments where ETL is needed? – Michael Tuchman Jun 12 '21 at 23:17
  • 1
    @MichaelTuchman i updated the answer. The functionality has been implemented since my initial answer. Thanks for helping me keep this up to date. – Piotr Findeisen Jun 18 '21 at 07:58
0

Personally, I like this function. It can translate any special character you throw at it. Inspired by this solution: BigQuery: Convert accented characters to their plain ascii equivalents

select regexp_replace(normalize('Goiânia, São Paulo, Palhoça, Várzea Paulista', NFD), '\pM')

-- result: 'Goiania, Sao Paulo, Palhoca, Varzea Paulista'

Basically you remove the "special character flag" from the digit.

Denn0
  • 377
  • 3
  • 15