0

I have to implement a function, which receives an std::string and returns an std::string only consisting of the letters a-z:

std::string Convert(const std:string& strWithSpecialChars);

For example this test should pass:

"kilicdaroglu" == Convert("Kılıçdaroğlu");

I am not exactly sure if my Point of view is correct, e.g. that ğ can be considered as a special form for g. I have to support not only turkish letters, but also French letters including accent or German special letters such as ÄÖÜäöü.

Background is a validation for the German social security number which includes the first letter of the family name - only a-z and I want to validate vs the family name.

Botje
  • 26,269
  • 3
  • 31
  • 41
Jan Hackenberg
  • 481
  • 3
  • 14
  • 1
    If you don't convert `ğ` to `g`, what would you convert it to? – Eljay Jun 08 '23 at 11:11
  • 2
    Other than having a 'map' of what all potential special characters should be converted to, I don't think the way the Unicode system works allows an easy option for this. – Adrian Mole Jun 08 '23 at 11:13
  • 1
    For example, does `æ` map to `a` or `e`? – Adrian Mole Jun 08 '23 at 11:14
  • 1
    _"German social security number"_ Well, then you'll need to find out how the officialities, who hand out these numbers, do that mapping actually. That should be publicly documented somewhere. – πάντα ῥεῖ Jun 08 '23 at 11:18
  • 1
    For extra fun: [which normal form is your input data and what normal form do you intend to produce](https://stackoverflow.com/questions/15985888/when-to-use-unicode-normalization-forms-nfc-and-nfd) – Botje Jun 08 '23 at 11:31
  • 1
    This feels like an XY-Problem. Why are you trying to do this? The proposed conversion results in information loss that you can never recover and may map 2 originally different strings onto the same output string. What does `ß` map to ? – Richard Critten Jun 08 '23 at 11:32
  • Pretty sure not all of those chars are representable as `char`, so you'll already have to use a different kind of string literal/`std::basic_string` instanciation as input. (`L'ç'` has value 231 for me so this is larger than the char max used by my compiler (127).) And I don't think the standard library contains a "looks similar" functionality for characters; you'll probably need to implement the mapping yourself... – fabian Jun 08 '23 at 11:33
  • Related: [How to remove accents and tilde in a C++ std::string](https://stackoverflow.com/q/144761/10871073) – Adrian Mole Jun 08 '23 at 12:01
  • This is an interesting problem. French `ï` should become `i`. But German `ï` should become `ie`. French `æ` should become `ae`. German `ß` should become `ss`. Turkish `ı` or `İ` should become `i`. Is there something that indicates if the word being converted is French, German, or Turkish? – Eljay Jun 08 '23 at 12:47
  • Note: you are describing some Latin characters. Accented characters are considered Latin scripts. So probably you googled wrongly. Search for e.g. how to remove accents from a string, etc.) – Giacomo Catenazzi Jun 09 '23 at 08:33

0 Answers0