1

I have a pandas dataframe with the below column which is in json format. I need to get the unique json keys from all the rows in the below dataframe. Not sure whether pandas has any inbuilt functions to do this.

 Col3

 {"a": 2000, "b": 1}
 {'c': 'Go bug go','d': 'go_line21','e': 'movie','duration': 2166}
user3447653
  • 3,968
  • 12
  • 58
  • 100

2 Answers2

0

This question would be better with actual code. The column as given isn't in json format, since it has both double-quoted (correct) and single-quoted (not json format) elements.

It's also not clear whether the columns are being stored as dicts or as strings.

That said, drawing from this answer on converting from string to dict and this answer on splitting a dictionary into columns, here's how to get the answer if the columns are in fact in json format : use json.loads and json_normalize then grab the column names.

import json
import pandas as pd
from pandas.io.json import json_normalize

df = pd.DataFrame({'Col3' : 
               ['{"a": 2000, "b": 1}',
                '{"c": "Go bug go", "d": "go_line21", "e": "movie", "duration": 2166}']})

json_normalize(df['Col3'].apply(json.loads)).columns.to_list()

If you have to correct incorrect quoting, then you need to follow this.

climatebrad
  • 1,286
  • 8
  • 13
0

You can create a new dataframe from it and grab the keys as a list:

list(pd.DataFrame(df['Col3'].to_list()).columns)

output:

# ['a', 'b', 'c', 'd', 'e', 'duration']
oppressionslayer
  • 6,942
  • 2
  • 7
  • 24