How to store JSON content in DataFrame column

Question

I have around 3500 JSON records with shapes like this:

{'text_analysis_raw_response': [{'entity_group': 'PER', 'score': 0.9537515640258789, 'word': 'God Almighty', 'start': 17, 'end': 29}, {'entity_group': 'ORG', 'score': 0.7446494102478027, 'word': 'Cali', 'start': 51, 'end': 55}, {'entity_group': 'LOC', 'score': 0.43644213676452637, 'word': 'te', 'start': 58, 'end': 60}, {'entity_group': 'LOC', 'score': 0.9999852180480957, 'word': 'Shaqraq', 'start': 128, 'end': 135}, {'entity_group': 'LOC', 'score': 0.9999912977218628, 'word': 'Al-Muqdadiya', 'start': 157, 'end': 169}, {'entity_group': 'PER', 'score': 0.9992551207542419, 'word': 'God', 'start': 346, 'end': 349}], 'text_locations_names': ['Shaqraq', 'Al-Muqdadiya'], 'et_sec': 7}

I want to store the text_locations_names in a new column in my DataFrame.

Using the line code:

data.at[index, 'new_locations_names'] = result['text_locations_names']

It gives me the following error:

TypeError                                 Traceback (most recent call last)
TypeError: float() argument must be a string or a number, not 'list'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-7-4ed103f87b65> in <module>
      2     result = extract_location_with_xlm_roberta(row['translated_title'])
      3     print(result)
----> 4     data.at[index, 'new_locations_names'] = result['text_locations_names']
      5 #     break

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
   2089             return
   2090 
-> 2091         return super().__setitem__(key, value)
   2092 
   2093 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
   2040             raise ValueError("Not enough indexers for scalar access (setting)!")
   2041 
-> 2042         self.obj._set_value(*key, value=value, takeable=self._takeable)
   2043 
   2044 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_value(self, index, col, value, takeable)
   3145             validate_numeric_casting(series.dtype, value)
   3146 
-> 3147             series._values[loc] = value
   3148             # Note: trying to use series._set_value breaks tests in
   3149             #  tests.frame.indexing.test_indexing and tests.indexing.test_partial

ValueError: setting an array element with a sequence.

Any thoughts on solving it?

An example of the data to look like:

Can you provide an example of what you would like the data to look like? — ArchAngelPwn, May 23 '22 at 14:06
Does this answer your question? [ValueError: setting an array element with a sequence. for Pandas](/q/33221655/4518341). If it doesn't, please make a [mre] including an example version of `data`. See [How to make good reproducible pandas examples](/q/20109391/4518341) for specifics. And [please don't post pictures of text](https://meta.stackoverflow.com/q/285551/4518341). — wjandrea, May 23 '22 at 20:50
Also, it looks like JSON is irrelevant to the problem. `result` is a dict already, and `result['text_locations_names']` is a list, that's all that matters. For a minimal example, you only need to worry about the list part. — wjandrea, May 23 '22 at 20:54
@ArchAngelPwn That is true, the list part is my problem. I want to store the list in the column rows, any idea? — , May 24 '22 at 09:33
@ArchAngelPwn can you see this please? https://stackoverflow.com/questions/72392108/how-to-skip-not-null-values-in-dataframe-column — , May 26 '22 at 14:23

karel van dongen · Answer 1 · 2022-05-23T17:04:11.770

0

You are trying to add a list to a column. If you want it al in one row:

data.at[index,'new_locations_names'] = ','.join(result["text_locations_names"])

edited May 23 '22 at 17:04

answered May 23 '22 at 15:20

karel van dongen

315
1
8

Thanks, but I did not understand your solution. – May 23 '22 at 16:19
you are extracting a list from the json, and then you can just combine that in to a string. – karel van dongen May 23 '22 at 17:02
Great! How can I do that as list? I want to store them as list not normal string. Worked but it saving them as string. – May 23 '22 at 20:49
you cant add a list to a dataframe row !!. but you can do something like this to split it to new columns df.assign(var1=df['new_locations_names'].str.split(',')).explode('new_locations_names') – karel van dongen May 23 '22 at 21:04
can you see this please? https://stackoverflow.com/questions/72392108/how-to-skip-not-null-values-in-dataframe-column – May 26 '22 at 14:22
can you see this please? https://stackoverflow.com/questions/72392108/how-to-skip-not-null-values-in-dataframe-column – May 26 '22 at 14:22

How to store JSON content in DataFrame column

1 Answers1