-1

So I have a list where each entry looks something like this:

"{'A': array([1]), 'B': array([2]), 'C': array([3])}"

I am trying to get a dataframe that looks like this

    A   B   C
0   1   2   3
1   4   5   6 
2   7   8   9

But I'm having trouble converting the format into something that can be read into a DataFrame. I know that pandas should automatically convert dicts into dataframes, but since my list elements are surrounded by quotes, it's getting confused and giving me

               0
0  {'A': array([1]), 'B': array([2]), 'C': array([3])}
...

I originally asked a question with an oversimplified my example dict as {'A': 1, 'B': 2, 'C': 3} so methods such as ast.literal_eval, and eval should typically work, but in the case of the arrays as values, I am running into a NameError NameError: name 'array' is not defined.

salamander
  • 181
  • 1
  • 3
  • 15

1 Answers1

1

Assuming those really are arrays of length 1, this hackery should do the job:

data = [
  "{'A': array([1]), 'B': array([2]), 'C': array([3])}",
  "{'A': array([4]), 'B': array([5]), 'C': array([6])}",
  "{'A': array([7]), 'B': array([8]), 'C': array([9])}"
]

import ast
import pandas as pd
data = [ast.literal_eval(d.replace('array([','').replace('])','')) for d in data]
a = pd.DataFrame(data)
print(a)

Output:

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
  • It worked. Seems like a strange problem to have. I wonder if there was another way that would have avoided this issue completely. – salamander Aug 08 '22 at 17:21
  • 1
    Well, the real solution is to create your original data more intelligently. There's no way you should be persisting data containing the string representation of numpy objects. There are lots of ways to store tabular data in ways that can be retrieved easily. A real database comes to mind. – Tim Roberts Aug 08 '22 at 17:32
  • I absolutely agree with you. I will change the output data format from the program the generated it, – salamander Aug 09 '22 at 01:53