I am trying to make a bar chart race for Dataphile (my YouTube channel) with "Judo Athletes with most Olympic medals". Here's my problem: some athletes have accents in their names in my dataset (csv) and I can't decode them properly.
For example, in my dataset at line 5, the ahtlete's name is "Andreas Tölzer".
Here is my code:
years = [str(y) for y in range(1972,2020, 4)]
sex = ["mens", "womens"]
cat = ["extra-lightweight", "lightweight", "half-lightweight", "half-middleweight", "middleweight", "half-heavyweight", "heavyweight", "open-class"]
df_results = pd.DataFrame(columns=["Athlete"] + years)
all_df = {}
for s in sex: # gets all sexes
for c in cat: #gets all weight categories
for y in years: # gets all years with summer olympics
try:
all_df[y] = pd.read_csv(r"C:\Users\joris\Coding\judo_olympics\olympics_summer_" + y + "_JUD_" + s + "-" + c +"_final_standings.csv")
df_med = all_df[y].head(4)[["Athlete"]]
iter_years = iter(years)
for w in years:
if int(w) >= int(y):
df_med.insert(len(df_med.columns), w, 1)
else:
df_med.insert(len(df_med.columns), w, 0)
df_results = df_results.append(df_med)
except FileNotFoundError:
pass
df_results = df_results.groupby("Athlete").sum()
df_results.index = df_results.index.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8') # got that from the internet
Here, we can see that the athlete's name has not been decoded properly in the output.
What I would like is to simply change letters with accent to the same letter without accent (example: "é" would become "e").
There should be no letter from other alphabets in my datasets, only annoying accents.
Please let me know if you have a solution or if you need more info from my code.
Thanks !