Fastest way to import JSON formated string to numpy.recarray?

Question

Consider this JSON formatted string:

json_string = '{"SYM": ["this_string","this_string","this_string"],"DATE": ["NaN","NaN","NaN"],"YEST": ["NaN","NaN","NaN"],"other_DATE": ["NaN","NaN","NaN"],"SIZE": ["NaN","NaN","NaN"],"ACTIVITY": ["2019-09-27 14:18:28.000700 UTC","2019-09-27 14:18:28.000700 UTC","2019-09-27 14:18:28.000600 UTC"]}'

I can import it to numpy.recarray doing these operations:

result      = ast.literal_eval(json_string)
names       = list(result.keys())
formats     = ['O'] * len(names)
dtype       = dict(names = names, formats=formats)
array       = numpy.array(result.items(), dtype=dtype)

This seems a lot of hops. Is there a faster way?

How about `numpy.array(json.loads(json_string).items(), dtype=dtype)`? literal_eval will fail for something like `{"FOO": [null, NaN]}` while the json module will correctly load it as `{u'FOO': [None, nan]}` — Paulo Scardine, Oct 03 '19 at 15:42
Why do you extract `names` and `formats` if you don't use them? `json_string` evaluates as a dictionary. Structured array `data` is supposed to be a list of tuples, where the tuples match the `dtype`. There isn't any special `json` processing in `numpy`. — hpaulj, Oct 03 '19 at 16:35

Paulo Scardine · Accepted Answer · 2019-10-03T16:55:57.693

2

You don't really need the second and third steps, and you can condensate the first and last in the same line:

array = numpy.array(ast.literal_eval(json_string).items(), dtype=dtype)

That said, I would use the json module instead of ast.literal_eval because literal_eval will fail for valid JSON like {"FOO": [null, NaN]}.

import json
numpy.array(json.loads(json_string).items(), dtype=dtype)

edited Oct 03 '19 at 16:55

answered Oct 03 '19 at 15:55

Paulo Scardine

73,447
11
124
153

1

I also prefer your json-based solution to the `ast.literal_eval` one (my choice of the latter was guided by ignorance, not anything specific about the problem at hand). – user189035 Oct 03 '19 at 15:59

Fastest way to import JSON formated string to numpy.recarray?

1 Answers1