I have a dataframe created like below, with countries in JSON format:
df = pd.DataFrame([['matt', '''[{"c_id": "cn", "c_name": "China"}, {"c_id": "au", "c_name": "Australia"}]'''],
['david', '''[{"c_id": "jp", "c_name": "Japan"}, {"c_id": "cn", "c_name": "China"},{"c_id": "au", "c_name": "Australia"}]'''],
['john', '''[{"c_id": "br", "c_name": "Brazil"}, {"c_id": "ag", "c_name": "Argentina"}]''']],
columns =['person','countries'])
I'd like to have the output as below, with just the country names, separated by a comma and sorted in alphabetical order:
result = pd.DataFrame([['matt', 'Australia, China'],
['david', 'Australia, China, Japan'],
['john', 'Argentina, Brazil']],
columns =['person','countries'])
I tried doing this using a few methods, but none worked successfully. I was hoping the below would split the JSON format appropriately, but it didn't work out - perhaps because the JSONs are in string format in the dataframe?
result = pd.io.json.json_normalize(df, 'c_name')