-1

I have a regex that transform a uppercase/lowercase string in a capitalized string. The problem in that in my country it's normal to have special characteres in the name, and it bugs my response

const updatedInput = input
            .replace(/\w+/g, (txt) => {
              return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
            })
            .trim();

If I use this method with "JOAO CARLOS NOBREGA ", the return is "Joao Carlos Nobrega". But if I use this method with "JOÃO CARLOS NOBREGA ", the return is "JoãO Carlos Nobrega". How can I solve this?

Martin
  • 5,714
  • 2
  • 21
  • 41

1 Answers1

0

There are several ways to match word characters. As you found out the pre-defined character group \w is useless as soon as diacritic marks are involved like in João or Günther or øl or René. Your options are:

I would probably do something like this:

const letter = '[A-Za-zÀ-ÖØ-öø-ÿ]';
const nonLetter = '[^A-Za-zÀ-ÖØ-öø-ÿ]';
const initialLetter = `(?<=${nonLetter}|^)${letter}`;
const capitalize = (string) => string.replace(
  new RegExp(initialLetter, 'g'),
  (match) => match.toUpperCase(),
);
const updatedInput = capitalize(input.trim().toLowerCase());

I tested with JANA GÜNTHER-ÄLZBÄCHER and got Jana Günther-Älzbächer and I tested with JOÃO CARLOS NOBREGA and got João Carlos Nobrega.

Martin
  • 5,714
  • 2
  • 21
  • 41