1

I usually use Newtonsoft's Json.NET for producing JSON files. Currently, I'm looking into importing a JSON-formatted data file into Google BigQuery. However, BigQuery requires a JSON file with newline as delimiter (probably known as JSON Lines too) which is a bit different than the usual tree style output from Json.NET. I've been searching the Internet on how to do this but have been unsuccesful in finding anything useful. Is this possible with Newtonsoft's or any other popular JSON serializers?

OnionJack
  • 130
  • 2
  • 10
  • 1
    I think it just wants a JSON array? no? https://cloud.google.com/bigquery/docs/loading-data-local. Json.Net example https://www.newtonsoft.com/json/help/html/SerializingCollections.htm – screig Mar 05 '18 at 10:41
  • 1
    Take a look at [Serialize as NDJSON using Json.NET](https://stackoverflow.com/q/44787652/3744182) for serialization and [Line delimited json serializing and de-serializing](https://stackoverflow.com/q/29729063/3744182) for deserialization. – dbc Mar 05 '18 at 11:14
  • @dbc NDCJSON is a domain grab, not a specification for newline-delimited JSON. It appear only 1 or 2 years ago. Services like AWS, Google, Azure use such files for several years. That grab is polluting search results which makes finding answers *harder* than it was just 1 year ago – Panagiotis Kanavos Mar 05 '18 at 12:19
  • @OnionJack big data services don't expect newline-delimited files because it's some kind of standard. They expect them because they *scale* and allow streaming operations - when you have 1GB of records you *don't* want to have to parse the entire string. The services are able to split records immediatelly when they encounter a newline, partition them and use different machines to process them. – Panagiotis Kanavos Mar 05 '18 at 12:24
  • @OnionJack you don't need to *cache* the entire string before writing it out either. Imagine creating a 1GB string in memory, or caching 1GB of data in order to create it and then having to GC all that. You **have** to write individual records one by one if you want your application to be able to handle large files. The best solution is to serialize each object in a loop and write it out immediatelly to the output file. – Panagiotis Kanavos Mar 05 '18 at 12:26
  • @OnionJack check Wikipedia's entry on [JSON Streaming](https://en.wikipedia.org/wiki/JSON_streaming) for more – Panagiotis Kanavos Mar 05 '18 at 12:28
  • Hi @dbc, thanks! I implemented something similarly as shown in the first link. – OnionJack Mar 05 '18 at 15:17
  • @PanagiotisKanvos, thanks for the warning. – OnionJack Mar 05 '18 at 15:18

2 Answers2

1

Have no idea how I wasn't able to find the links as provided by dbc in his comment:

Take a look at Serialize as NDJSON using Json.NET for serialization and Line delimited json serializing and de-serializing for deserialization. – dbc 3 hours ago

I did the naïve version where I built up a StringBuilder object but it's naturally not suitable for large amount of data.

However, I agree with Panagiotis Kanavos on the following comment:

That grab is polluting search results which makes finding answers harder than it was just 1 year ago

NDJSON, JSON Lines, newline delimited JSON... Not so sure which name to go by. :)

OnionJack
  • 130
  • 2
  • 10
-1

As seen in the JSON spec on www.json.org, newlines in a string are encoded as \n when in JSON.

Railroad Diagram for JSON String

JSON.Net is not introducing newline characters into the output arbitrarily, but rather your original string already has newline characters, so they have to be encoded. Can also try

JsonConvert.SerializeObject(retPair.publicKey.Replace("\n",""))