1

I have a string surname.

I want to replace special Bulgarian. Polish Characters with an English standard replacement.

For example surname = "Tuğba Delioğlu"

Final Output string should be: tugbadelioglu


To implement this I have just done a series of string.replaceAll as follows:-

surname = surname.replaceAll("ı", "i");
surname = surname.replaceAll("ł", "l");
surname = surname.replaceAll("Ł", "l");
surname = surname.replaceAll("ń", "n");
surname = surname.replaceAll("ğ", "g");

surname = surname .replaceAll("\\p{InCombiningDiacriticalMarks}+", ""); // this will remove diacritics

String newSurname = surname.replaceAll("[^a-zA-Z]",""); // remove non A-Z characters 

surname = surname.replaceAll("\\s","").toLowerCase(); // remove spaces and make lowercase

Is there a more efficient way to do this i.e. have an array with:- Character to Replace Character to Replace with

then loop through the string and replace each matching character with its representation from the array?

This will be fairly high volume, so looking for the most efficient way to do it.

Cardinal System
  • 2,749
  • 3
  • 21
  • 42

1 Answers1

0

What you could do is create a character array which would map each character onto what it should be replaced with (or the same character if no replacement is needed). Then you could go through the string (better passed as a character array) and blindly replace each character with what it should be replaced with.

There's a special case with removing some characters. You'll need a second boolean array for this.

Here's a sketch of the code:

char[] replacements = new char[Character.MAX_VALUE];
boolean[] removals = new boolean[Character.MAX_VALUE];
// fill these arrays
// like replacements['ł'] = 'l';

public String replaceSpecialBulgarianCharacters(String str) {
    char[] s = str.toCharArray();
    StringBuilder sb = new StringBuilder(s.length);
    for (int index = 0; index < s.length; index++) {
        char c = s[index];
        if (!removals[c]) {
            sb.append(replacements[c]);
        }
    }
    return sb.toString();
}
lexicore
  • 42,748
  • 17
  • 132
  • 221