0

I read my data from excel and saved it in data frame format. One of the columns of the data has data in a dictionary format(same shape but not dictionary format), which is recognized as a string format. So, I want to change the data type of all rows (more than 40k) in that column from string to dictionary format. The when printing out column, the results look like this:

df['fruit']
 0    NaN                            
 1    {'apple': [{'A': 1, 'B': 2, ...
 2    {'apple': [{'A': 3, 'B': 4, ...
 3    {'orange': [{'A': 5, 'B': 6...   
 4    {'apple': [{'A': 0, 'B': 9, ...

If I use that to_dict() to the column, it will be converted as follows.

df['fruit'].to_dict()
{0: NaN,
 1: "{'apple': [{'A': 1, 'end': b, ...}",
 2: "{'apple': [{'A': 3, 'B': 4, ...}",
3: "{'orange': [{'A': 5, 'B': 6...}",
4: "{'apple': [{'A': 0, 'B': 9, ...}",

Then, when using to_dict('list'), I got the following error message.

df['fruit'].to_dict('list')
....
TypeError: unsupported type: <class 'str'>

I want to use the dictionary format because I need only the information corresponding to 'B' in the data corresponding to the 'orange.'

Any help would be greatly appreciated!

SEan1820
  • 47
  • 5
  • `df['fruit'].to_dict()` returns a dictionary of the form: `{index1: value1, index2: value2,...}` as you saw from that output. It **does not** convert each cell (value) into a dictionary for you. Maybe you should be looking at `pd.json_normalize`. – Quang Hoang Sep 18 '22 at 03:35

1 Answers1

1

Use:

import pandas as pd
df = pd.DataFrame({'string dict':["{'a': 1}", "{'b':2}"]})

df['string dict'].apply(eval)

which can be validated as follows:

type(df['string dict'].apply(eval)[0])

returns:

dict

Based on your comment:

df['string dict'].fillna('{}').apply(eval)

I reproduced your error using the following test data:

df = pd.DataFrame({'string dict':["{'a': 1}", "{'b':2}", np.nan, 2]})
keramat
  • 4,328
  • 6
  • 25
  • 38
  • Thank you for your answer. When running the code, I receive the following message. ```eval() arg 1 must be a string, bytes or code object``` I think it is because 'NaN' rows. – SEan1820 Sep 18 '22 at 04:37
  • Look at the answer again. – keramat Sep 18 '22 at 04:40
  • Solved your problem? – keramat Sep 18 '22 at 04:49
  • Thanks so much for the second reply. But still, the same error message comes out. – SEan1820 Sep 18 '22 at 04:50
  • Look at the answer one more time. I think you have some other data types in your column such as numbers. Please check this and if it is the case clear me what should you want to do with this case. For example you want to remove them or something else. – keramat Sep 18 '22 at 05:16
  • I use the raw data to load the data back, then use the code you provided and it works. Maybe when I used your code, I used some mutated data. Thank you! Now I will get the specific data using the dictionary data format. Have a good day! – SEan1820 Sep 18 '22 at 05:26
  • `eval` is a pretty [dangerous command](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice), especially when using external input. Check `pd.eval` as a safer substitution, or use `pd.json_normalize` – mozway Sep 18 '22 at 06:07