4

An aiohttp application fetches a JSON from external resource and need to use it to perform another request passing the JSON as the request body.

To avoid serialization/deserialization overhead ujson is used and then the JSON object is just passed to be used in the subsequent request without ever loading or dumping. This works but the JSON cannot be manipulated this way, just forwarded.

Probably there is no way to manipulate it without deserializing it but since ujson is used, the object is first deserialized as a C object. Having that in mind, is there a way to keep manipulating this object at C level without ever bringing it as a Python dict. An example operation would be del keys from the JSON or creating a new JSON with just a subset of the original JSON. Or checking if a given key exists in this JSON.

Rodrigo Oliveira
  • 1,452
  • 4
  • 19
  • 36
  • I don't understand why you use [tag:ujson], a JSON parser, if you don't manipulate the JSON at all. Just pass it through as an opaque string instead…?! But, yes, if you want to manipulate JSON without entirely un-/serialising it, maybe a streaming JSON parser/encoder would be the answer; while you stream through it, you simply omit to output certain parts. Can't recommend anything in particular though. – deceze Apr 26 '19 at 12:04
  • @deceze sorry for the confusing statement, `ujson` is of no use when not serializing/deserializing the JSON object, it is there for exploring JSON manipulation down the road. The problem though is that when using `loads`/`dumps` I lose all the performance advantages of the `ujson` C implementation bringing the whole object into a Python dict. I would like to stay in the C level for the manipulation. – Rodrigo Oliveira Apr 26 '19 at 12:39

1 Answers1

2

This might help you out: https://github.com/lemire/simdjson

I don't completely understand the use case, but it's a lib that aims to

We provide a fast parser, that fully validates an input according to various specifications. The parser builds a useful immutable (read-only) DOM (document-object model) which can be later accessed.

It's a bit specific, it requires CPUs with certain technologies and specific compilers, but seems to me it could fit your use case.

It also has wrappers for other languages, including python.

  • This fits well for my use case. Though one cannot keep fully at C level manipulations I can bring just a tiny part of the JSON as Python object avoiding deserializing the whole JSON as demonstrated in the Python API example https://github.com/TkTech/pysimdjson#example – Rodrigo Oliveira Apr 26 '19 at 12:43