1

I am looking on how to write new line delimited JSON using Json.Net. I need to do this to be able to export data to Google BigQuery - https://cloud.google.com/bigquery/data-formats#json_format

Currently the only way I've found to do it is loop through my collection and serialise each object one at a time but I wondered if there was a better way.

I came across this previous question but the answer only explains how to read newline delimited Json, not how to write it

Community
  • 1
  • 1
Jon Clarke
  • 366
  • 2
  • 14
  • If you have code that currently works, then what are you asking? What is the concrete problem you would like help in solving? – dbc Jul 08 '16 at 20:18
  • @dbc Because I want to know if there is a better way. Rather than looping round my List collection, I want to know if there is a way of passing my collection to the Json serializer and it can format the json the way I need. – Jon Clarke Jul 11 '16 at 08:09
  • @JonClarke Curious if you're able to format/save your List to new line delimited JSON? As of the moment, we also do the same thing - looping and serializing each to a C# object. –  Nov 24 '17 at 06:54
  • @projectzerotohero yes I'm still doing it the same way, I haven't found a better way to do it yet – Jon Clarke Nov 27 '17 at 09:04

1 Answers1

1

There is no such thing as newline-delimited JSON. What you ask is storing JSON objects in single, separate lines. This is used by many big data and event processing products, including Azure Stream Analytics, Hive, Google's Big Query etc.

This method of storage is used because it makes parallel processing a lot easier:

  • When reading, a single file can be partitioned easily by line without actually parsing the entire text, and assigned to different threads or workers.
  • Lines can be processed independently, without waiting for the entire text to be parsed. This allows you to take advantage eg of asynchronous operations and/or Dataflow to read and parse concurrently
  • When writing, multiple threads can write the data to different files, then all the files can be merged in a single one. Even when you write to a single disk, OS and disk buffering and operation overhead means that sendind X operations concurrently can finish faster than executing X operations sequentially.
  • Each worker/thread can write a new record directly. A parser would need access to all records to generate the file.

For this reason, it is not a good idea to use a parser to generate such files, even if the parser supported it. A single-threaded implementation would simply be too slow, and would force you to collect all records before writing them out.

To improve performance, you could write to multiple files, preferably on separate disks and combine all files into one at the end. You could also write each record as it is generated, instead of waiting to load all of them into memory before writing them out.

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • 3
    There is a format called exactly Newline Delimited JSON: http://ndjson.org/ – Koterpillar Oct 11 '17 at 04:45
  • 1
    @Koterpillar no, that's only the informal name given to something that is *not* JSON. It's JSON fragments, separated by newlines, used for the reasons above. The Github repo you link to was created long *after* people started using newlines. The only libraries mentioned are the author's own, never updated. That's not a format's site, that's a failed PR stunt – Panagiotis Kanavos Oct 11 '17 at 07:32
  • @Koterpillar and the legalistic "spec" is essentially useless. It's just links to the actual standard. All it says is "use newline as a separator". That's not what's used though. Newlines should *not* be used, except to separate entire records. – Panagiotis Kanavos Oct 11 '17 at 07:35
  • 1
    It might not be an IETF standard, and it might not have a large following, but it certainly exists. It also looks like at least a few libraries have been written by different people. We're using one of them in production and it is stable. – Koterpillar Oct 11 '17 at 09:06
  • @Koterpillar on the contrary, this particular format is VERY common, used by a lot of cloud and IoT products, supported by a LOT of libraries. It simply isn't related to that site, whose only purpose seems to be domain squatting. It's like patents. You *can't* claim to propose as a standard what people are already using – Panagiotis Kanavos Oct 11 '17 at 09:15
  • Well, I don't think there's a better name for it, and it is convenient to keep the information in one place so the implementations agree. There's nothing wrong with standardising an existing implementation already in use. – Koterpillar Oct 11 '17 at 09:51