How can I match a Pandas Dataframe with invalid characters (accents) to an array?

Question

I have been trying to use a Pandas Dataframe in Python 3 to find a specific id matching a name from a CSV file. The API I am reading from gives me the name António, along with other names, the way I need it to with the accent in a column called "first". I have an array of names that won't necessarily have all of the accents I need to match. This program seems to work for every name I try except for the ones that have different values for accented characters.

import pandas as pd

nameArray=[Antonio,Matt,Mark,Raul]
playersUrl = 'https://www.FakeSite.com/players'
playerData = pd.read_csv(playersUrl, names=["PLAYERID", "FIRSTNAME"]

for first, playerid in zip(playerData["FIRSTNAME"],playerData["PLAYERID"]):
    for i in len(nameArray):
        testName = nameArray[i]    
        if first == testName:
            return playerid

Are you saying you want to match names with or without accents? — Stephen Rauch, Jan 03 '17 at 02:44
The playerData dataframe from the CSV has accented words. The array is filled with unaccented words, and I need them to match. — Dan, Jan 03 '17 at 02:47

score 1 · Answer 1 · edited May 23 '17 at 12:16

1

If you want to do a compare without diacritics, see previous SO post here:

Unidecode is the correct answer for this. It transliterates any unicode string into the closest possible representation in ascii text.

edited May 23 '17 at 12:16

Community

1
1

answered Jan 03 '17 at 02:51

Stephen Rauch

47,830
31
106
135

How can I match a Pandas Dataframe with invalid characters (accents) to an array?

1 Answers1