How to replace string by other strings if the source string contain special characters

Question

In order to clean some string, I have to remove some substring that contains some special UTF-8 characters.

example:

source = "Skoda"
to_be_clean = "Škoda Rapid"

I need to replace from to_be_clean the string source by nothing. Obviously, the to_be_clean string contains some special character. Is there a way to do this task simply. Here is how I am doing it today.

output = to_be_clean.replace(source + ' ', '')

I was thinking about a regular expression but I need to list all the possible characters.

It's *really* not clear what you want. Are you hoping to find a way to make `"Škoda"` equal to `"Skoda"` so that you can then remove it? There are many questions about removing accents from Unicode; have you googled those? — tripleee, Feb 21 '18 at 15:01

score 2 · Accepted Answer · answered Feb 21 '18 at 15:10

2

unicodedata module should solve your problem.

# -*- coding: utf-8 -*-

import unicodedata
to_be_clean = u"Škoda Rapid"

print unicodedata.normalize('NFKD', to_be_clean).encode('ASCII', 'ignore')

Output:

Skoda Rapid

answered Feb 21 '18 at 15:10

Rakesh

81,458
17
76
113

Thanks, exactly what I was looking for. I was actually not aware of unicodedata module. Thanks – Michael Feb 21 '18 at 15:32

How to replace string by other strings if the source string contain special characters

1 Answers1