2

I was using the initcap() function in postgres when I noticed that for some names, like gunnar pålsen, it would output Gunnar PåLsen (with a capital L), other times it would do it correctly for the same name. While investigating I notice that the å was not in fact a single char (U+00E5), but rather two chars: a (U+0061) and the diacritic U+030A (ref).

Pål vs Pål

Is there an simple/elegant way to convert from two chars like this to a single char? It should work for other versions too, like é and ö.

My current solution looks like this:

select
    initcap(
        replace(
            replace(
                replace(
                    replace(
                        replace(
                            replace(
                                replace(
                                    lower(person.full_name),
                                    'á', 'á'
                                ),
                                'ö', 'ö'
                            ),
                            'å', 'å'
                        ),
                        'é', 'é'
                    ),
                    'è', 'è'
                ),
                'ä', 'ä'
            ),
            'ü', 'ü'
        )
    ) as full_name
from person

This isn't very robust, new combinations might show up. (Notice that it seems the copy-paste here has managed to convert the chars, so assume the one on the left is two chars one where there is a diacritic.)

I also tried to use translate(), but I don't think it will work because it seems to require translation between single chars, while in this case we go from two to one.

A solution for python is also acceptable. Part of the dataflow goes via python so I can fix it there if needed. A postgres solution is preferable.

André C. Andersen
  • 8,955
  • 3
  • 53
  • 79

1 Answers1

1

Based on the comment of @Boris, the following works in postgres:

select initcap(normalize('gunnar pålsen')) as full_name

Outputting the expected Gunnar Pålsen.

André C. Andersen
  • 8,955
  • 3
  • 53
  • 79