3

I am trying to read a single json record into memory at a time using .net core 3.0.

This page: https://devblogs.microsoft.com/dotnet/try-the-new-system-text-json-apis/

Gives this example using a reader:

byte[] data = Encoding.UTF8.GetBytes(json);
Utf8JsonReader reader = new Utf8JsonReader(data, isFinalBlock: true, state: default);

while (reader.Read())
{
    Console.Write(reader.TokenType);

    switch (reader.TokenType)
    {
        case JsonTokenType.PropertyName:
        case JsonTokenType.String:
        {
            string text = reader.GetString();
            Console.Write(" ");
            Console.Write(text);
            break;
        }

        case JsonTokenType.Number:
        {
            int value = reader.GetInt32();
            Console.Write(" ");
            Console.Write(value);
            break;
        }

        // Other token types elided for brevity
    }

    Console.WriteLine();
}

In the example it loads the entire byte array. My main concern is memory as I am dealing with large json files, I don't want to load all of it just the current record being worked on (or at least a smaller chunk).

I am not sure how to pass a byte stream to Utf8JsonReader and read one record at a time.

What is simple way to read one record at a time with .net core 3.0?

dbc
  • 104,963
  • 20
  • 228
  • 340
Guerrilla
  • 13,375
  • 31
  • 109
  • 210
  • As json "wraps" large parts of the full json body, i doubt its possible to read it line by line. Maybe you could save it in smaller chunks instead, eg only 10k items per file. If you want some work you could probably also chop it into smaller pieces by finding matching braces and parse the content in between etc. – Charles Jan 08 '20 at 23:03
  • There is no `JsonTextReader` equivalent? – Guerrilla Jan 08 '20 at 23:12
  • is your input a simple array of multiple items or is it buried inside other elements? If its a simple array, you might be able to read in each element (char by char starts and finish with `{ }` but other than that, i doubt its possible. – Jawad Jan 08 '20 at 23:12
  • I don't know about simpler way, but if you really require your solution to iterate through each object. It may be best to match with regex on opening and closing `{ }` – AbdulG Jan 08 '20 at 23:27
  • @AbdulG to do the regex, you'll have to read in the entire data, no? – Jawad Jan 08 '20 at 23:31
  • @Jawad you can specify ^ and $ to start looking at start or end of the string. – AbdulG Jan 08 '20 at 23:51
  • @AbdulG no you read line by line into a buffer, `StreamReader` is easy fascade to do this with. I could do it by counting opening and closing brackets but it is surprising the new library does not have this built in – Guerrilla Jan 08 '20 at 23:52
  • @Guerrilla then you may want to look into newline-delimited JSON or JSON Lines to achieve this http://jsonlines.org/ – AbdulG Jan 09 '20 at 00:03
  • if you go line by line then you expect the json to be formatted properly, anything moving up and down will break it. – Jawad Jan 09 '20 at 00:19
  • 2
    I found example of how to do what I want here: https://stackoverflow.com/questions/54983533/parsing-a-json-file-with-net-core-3-0-system-text-json – Guerrilla Jan 09 '20 at 22:55

1 Answers1

1

The way to achieve this form of functionality would be to use JSON Lines file format with StreamReader class. JSON Lines file extension is .jsonl.

This amends the JSON string with a new line character after every JSON object. With this you can use StreamReader.ReadLine and then just deserialize the entire line.

See JSON Lines http://jsonlines.org/ for more details.

AbdulG
  • 720
  • 5
  • 16
  • 1
    There's no such file format, extension or standard. What you linked to is just one of many attempts by people to hijack the common practice of streaming unindented JSON records per line as their own invention and push their products, services, company or whatever. In the past that site was built to look the same as json.org, with which it has absolutely no relation. You'll find mentions of NDJSON, JSON Lines, JSON NL, Streaming JSON all over the place. – Panagiotis Kanavos Jul 12 '22 at 07:26
  • The *real* standard is [JSON Text Sequences described in IETF RFC7464](https://datatracker.ietf.org/doc/html/rfc7464) with an actual MIME type `application/json-seq`. – Panagiotis Kanavos Jul 12 '22 at 07:37