0

After exhaustive search and trying to make it work... here I am.

Summary of what I need: I need to create and read csv files, so I could freely manipulate the data from the files. For example:

row1 = {"col1": "value1", "col2": 0.5, "col3": [{"key1": 0.2}, {"key2": 0.3}]
df = pandas.DataFrame(row1, columns=["col1", "col2", "col3"]
df.to_csv("test.csv")

The file that I get looks like that:

    col1    col2    col3
0   value1  0.5   [{'key1': 0.2}, {'key2': 0.3}]

After that I can read the file with:

df = pandas.read_csv("test.csv")

The problems come with cases such as col3. I I foolishly thought that when I convert the data frame:

df = list(df.T.to_dict().values())

I could access col3 with:

for row in df:
    wanted_data = row["col3"]

I thought that I will get a list with two dictionaries in it. What I actually get is a string.

How can I access key1, key2 (the keys are different) and their corresponding values? I already tried with:

json.loads(wanted_data)

But no matter what I do I get those kinds of errors:

Traceback (most recent call last):
  File "<pyshell#230>", line 1, in <module>
    now("tst_2.csv")
  File "D:\test_DUAL_up_to_RoleMap_official_edition - Copy\kb_test_manipulation.py", line 84, in now
    res = json.loads(new_supercl)
  File "C:\Users\NOTEBOOK2\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "C:\Users\NOTEBOOK2\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\NOTEBOOK2\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

Any ideas how I should store and/or read the data so I could properly access col3?

petro.y
  • 16
  • 3
  • Possible duplicate of [Extract nested JSON embedded as string in Pandas dataframe](https://stackoverflow.com/questions/34092808/extract-nested-json-embedded-as-string-in-pandas-dataframe) – Dmitry M Oct 10 '18 at 18:16
  • What's the string `json.decode` is passed? – SIGSTACKFAULT Oct 10 '18 at 18:17
  • The exact string is: string = [{'mama': 0.5}, {'tati': 0.5}] And it seems that: for x in string: x.replace("'", '"') does not turn the single quotes into double quotes... – petro.y Oct 10 '18 at 18:31

1 Answers1

0

The problem is when you saved the initial DataFrame it was already storing column 3 as strings. Your csv actually looks like this:

    col1    col2    col3
0   value1  0.5     {'key1': 0.2}
1   value1  0.5     {'key2': 0.3}

                    # Note this column is string.

Therefore when you read it, col3 is giving you a str object instead of a dict as you might expect.

if there's no explicit reason for you to store col3 as a string representation of the dictionary, store only the value. You should update row1['col3'] to reflect this instead:

{'col1': 'value1', 'col2': 0.5, 'col3': [0.2, 0.3]}

Assuming you only have a dictionary with one key in each of item of col3, you can convert it as such:

row1['col3'] = [[i for i in v.values()][0] for v in row1['col3']]

So that your df will now give you only the values of the dictionary:

     col1  col2  col3
0  value1   0.5   0.2
1  value1   0.5   0.3

After that your df.to_csv and df.read_csv should be fine.

I feel like there's a more obvious solution to this than I'm missing, so any pandas expert please chime in.

r.ook
  • 13,466
  • 2
  • 22
  • 39
  • My csv looks like: ` col1 col2 col3 0 value1 0.5 [{'key1': 0.2}, {'key2': 0.3}] ` The thing is that key1 and key2 ARE different and I need to be able to access both (plus their associated values). Sorry, I will edit my initial post. – petro.y Oct 10 '18 at 19:32