How to change case of a UTF file

Question

I have a UTF file in uppercase, and I want to change all words to lowercase.

I have tried:

`tr '[:upper:]' '[:lower:]' < input.txt > output.txt`

But that changes only characters without an accent.

Maybe this belongs on SuperUser? – Kieren Johnstone Jul 17 '10 at 09:49 — Kieren Johnstone, Jul 17 '10 at 09:49
Sure, mistake, but I have no idea how to move it. – liborw Jul 17 '10 at 10:13 — liborw, Jul 17 '10 at 10:13

score 4 · Answer 1 · edited Jun 21 '22 at 11:22

4

Finally the simplest way I found is to use AWK:

awk '{print tolower($0)}' < input.txt > output.txt

edited Jun 21 '22 at 11:22

Peter Mortensen

30,738
21
105
131

answered Jul 17 '10 at 09:35

liborw

842
1
8
22

1

This is, indeed, the "correct" way to go about it, since `awk` is Unicode-aware and `tr` isn't. This should be the accepted answer. – DevSolar Dec 15 '14 at 09:29

score 1 · Accepted Answer · answered Jul 16 '10 at 17:14

This is because the default character classes only work on standard ASCII, which does not include most of the international accented characters. If you have a defined set of those characters, the easiest way would be to simply add the mapping from special uppercase character to special lowercase character manually:

tr 'ÄÖU[:upper:]' 'äöü[:lower:]'

If you only have a few accented characters, this is workable.

Dennis Williamson · Answer 3 · 2010-07-16T21:20:36.287

0

No, the issue is that tr is not Unicode aware.

$ grep -o '[[:upper:]]' <<< JalapeÑo
J
Ñ
$ tr '[:upper:]' '[:lower:]' <<< JalapeÑo
jalapeÑo

The reason to use [:upper:], etc., is in order to handle characters outside ASCII. Otherwise, you could just use [A-Z] and [a-z]. That's also why PCRE has a character class called [:ascii:]]:

$ perl -pe 's/[[:ascii:]]//g' <<< jalapeño
ñ

edited Jul 16 '10 at 21:20

answered Jul 16 '10 at 21:02

Dennis Williamson

346,391
90
374
439

You're right! But using character classes never worked for me up to now, neither in unicode nor in latin1, so I gave up on it a long time ago and always do it manually :-( – JeSuisse Jul 17 '10 at 10:42

How to change case of a UTF file

3 Answers3

Linked