This is the first question I'm ever asking on StackOverflow, so please don't tear me to shreds too harshly.
I have a Pandas DataFrame containing a "fieldsOfInterest" column with JSON data, similar to this (possibly not an accurate reproduction, will be afk for a few hours and then update this - wish you could hide questions here):
In:
df = pd.DataFrame([
["1", [{"code":"FOI_AGRICULTURE_FOOD|FOI_AF_FOOD_INDUSTRY"}, {"code":"FOI_AGRICULTURE_FOOD|FOI_AF_FORESTRY"}]],
["2", [{"code":"FOI_AGRICULTURE_FOOD|FOI_AF_SOMETHING_ELSE"}, {"code":"FOI_AGRICULTURE_FOOD|FOI_AF_FORESTRY"}]]
], columns = ["id", "fieldOfInterest"])
df
Out:
id fieldOfInterest
0 1 [{'code': 'FOI_AGRICULTURE_FOOD|FOI_AF_FOOD_IN...
1 2 [{'code': 'FOI_AGRICULTURE_FOOD|FOI_AF_SOMETHI...
What I want to do is to add a new column that for each entry contains a list of all the "code" elements in the relevant entry in the old column, so for the first entry above
['FOI_AGRICULTURE_FOOD|FOI_AF_FOOD_INDUSTRY',
'FOI_AGRICULTURE_FOOD|FOI_AF_FORESTRY']
I have a solution that works for a single row:
foi_normalized = pd.json_normalize(df["fieldsOfInterest"].iloc[1])
foi_codes = foi_normalized["code"]
foi_list = foi_codes.tolist()
print(foi_list)
But when I try a similar approach for the whole column...
def interest_reader(foi_old):
foi_normalized = pd.json_normalize(foi_old)
foi_codes = foi_normalized["code"]
foi_list = foi_codes.tolist()
return foi_list
df["fieldsOfInterest_new"] = df["fieldsOfInterest"].apply(interest_reader)
I got the error below:
File "...", line 15, in <module>
df["fieldsOfInterest_new"] = df["fieldsOfInterest"].apply(interest_reader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...", line 4771, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...", line 1105, in apply
return self.apply_standard()
^^^^^^^^^^^^^^^^^^^^^
File "...", line 1156, in apply_standard
mapped = lib.map_infer(
^^^^^^^^^^^^^^
File "pandas\_libs\lib.pyx", line 2918, in pandas._libs.lib.map_infer
File "...", line 11, in interest_reader
foi_normalized = pd.json_normalize(foi_old)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...", line 446, in _json_normalize
raise NotImplementedError
NotImplementedError
I have tried several other approaches but nothing has worked. I'm now thinking about approaching the values simply as dictionaries and for each entry looping through each one to get each value for the "code" key. I'd be glad about any pointers, thank you!