0

I am trying to read some names from a json file that include special characters. Unfortunately when I use encoding utf-8 in json.load, it still does not read in the special characters into my pandas dataframe.

def player_matrix(player_file): 
    with open(player_file) as f:
        data = json.load(f, encoding='utf-8')
    all_players = pd.DataFrame(data)
    
    player_dataset = pd.DataFrame(columns=['player_id','name','short name', 'nation', 'team_id' ])
    
    for index,player in all_players.iterrows():
        player_dataset.at[index,'player_id']=player['wyId']
        player_dataset.at[index,'name'] =  str(player['firstName'])+' '+str(player['lastName'])
        player_dataset.at[index,'short name'] =  player['shortName']
        player_dataset.at[index,'nation'] =  player['currentNationalTeamId']
        player_dataset.at[index,'team_id'] =  player['currentTeamId']

    return player_dataset

players_df = player_matrix(playerfile)
players_df

and my output looks like this: OUTPUT

what can I do to read in these special characters into a jupyter notebook as opposed to the unicode representation?

EDIT: This is a sample of the json file (open in excel) enter image description here

  • 2
    Your Unicode in player_file is encoded in some encoding, possibly 'raw_unicode_escape'. **Post a snippet of a few lines of player_file so we can figure out what encoding.** I think you need `open(player_file, encoding='raw_unicode_escape')` per the existing solution I cited. – smci Feb 04 '21 at 22:57
  • I just posted a snippet of the player_file. Unfortunately, the suggestions in the post you recommended did not work – Ian Dragulet Feb 05 '21 at 01:09
  • 1
    I eventually got it to work with something similar: player_dataset.at[index,'shortname'] = player['shortName'].encode().decode('unicode-escape') – Ian Dragulet Feb 05 '21 at 04:03
  • 1
    IanDragulet: then the suggestion in the post I recommended *did* work: use the same unicode encoding (and escaping) as originally used to write the json. There are various encodings/escapings, you jsut need to identify which one. – smci Feb 05 '21 at 05:36

0 Answers0