0

Consider this JSON formatted string:

json_string = '{"SYM": ["this_string","this_string","this_string"],"DATE": ["NaN","NaN","NaN"],"YEST": ["NaN","NaN","NaN"],"other_DATE": ["NaN","NaN","NaN"],"SIZE": ["NaN","NaN","NaN"],"ACTIVITY": ["2019-09-27 14:18:28.000700 UTC","2019-09-27 14:18:28.000700 UTC","2019-09-27 14:18:28.000600 UTC"]}'

I can import it to numpy.recarray doing these operations:

result      = ast.literal_eval(json_string)
names       = list(result.keys())
formats     = ['O'] * len(names)
dtype       = dict(names = names, formats=formats)
array       = numpy.array(result.items(), dtype=dtype)

This seems a lot of hops. Is there a faster way?

user189035
  • 5,589
  • 13
  • 52
  • 112
  • 1
    How about `numpy.array(json.loads(json_string).items(), dtype=dtype)`? literal_eval will fail for something like `{"FOO": [null, NaN]}` while the json module will correctly load it as `{u'FOO': [None, nan]}` – Paulo Scardine Oct 03 '19 at 15:42
  • Perfect in my case, as I dtype is pre-set. Thanks! – user189035 Oct 03 '19 at 15:54
  • 1
    Why do you extract `names` and `formats` if you don't use them? `json_string` evaluates as a dictionary. Structured array `data` is supposed to be a list of tuples, where the tuples match the `dtype`. There isn't any special `json` processing in `numpy`. – hpaulj Oct 03 '19 at 16:35

1 Answers1

2

You don't really need the second and third steps, and you can condensate the first and last in the same line:

array = numpy.array(ast.literal_eval(json_string).items(), dtype=dtype)

That said, I would use the json module instead of ast.literal_eval because literal_eval will fail for valid JSON like {"FOO": [null, NaN]}.

import json
numpy.array(json.loads(json_string).items(), dtype=dtype)
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
  • 1
    I also prefer your json-based solution to the `ast.literal_eval` one (my choice of the latter was guided by ignorance, not anything specific about the problem at hand). – user189035 Oct 03 '19 at 15:59