I will start by saing that i have NO INFLUENCE on the input and suggestions to correct it cant help me. I am asking how to fix the output.
I have descriptions in German. The problem is that some of them were corrupted in the process. Words that have one of 7 German special letters, can have corrupted chars like:
('%�%')
('%¿%')
('%Ø%')
('%¶%')
('%Â%')
('%Ã%')
('%©%')
The difficulty is also because one letter can be "translated" to one corrupted char or even 3 corrupted chars. So the word "für" can be corrupted to "fÂr" or to "f??r" or to "f�r" and i dont have any specific pattern that i can use in regex.
I need to build some algorithm that:
- Finds a corruption in a given description.
- Finds the correction for the corrupted word.
What do i have?
- The description
- A German dictionary with all the words that have special chars.
I want to implement it in PHP\Queries but its not mandatory. Any ideas how to do it?