I am a beginner in regex and wanted to ask how you can solve this problem with regex. At the moment I am trying to preprocess german text. German has a few specific characters in it's alphabet (ä, ö, ü). However those letters can also be written in a different way (ae, oe, ue). So I simply used the replace method, which worked fine.
import pandas as pd
df = pd.DataFrame({"text": ["Uebergang", "euer"]})
df["text"] = df["text"].str.replace("ae", "ä")
df["text"] = df["text"].str.replace("Ae", "Ä")
df["text"] = df["text"].str.replace("oe", "ö")
df["text"] = df["text"].str.replace("Oe", "Ö")
df["text"] = df["text"].str.replace("ue", "ü")
df["text"] = df["text"].str.replace("Ue", "Ü")
But there are also specific patterns where the replacement shouldn't take place. Like in the word "euer". With some help of this post, I tried to make a working regex expression: Regex Pattern to Match, Excluding when... / Except between
df["text"] = df["text"].str.replace("[AaÄäEe]ue|(ue)", "ü")
So if there are any of the characters in the brackets [AaÄäEe] and afterwards the "ue" follows, then I would like to exlude those cases. Otherwise "ue" will be replaced by "ü". But this doesn't work, so how do you do it? Thanks in advance.