0

I have a case where I'm being sent a large file with JSON data in it. Unfortunately it has a small line of overhead string data at the start of the file and another line of overhead string data at the end of the file. I was previously reading in the file data manually into a string, and removing this data in memory, but the size has become so large that I can no longer do this.

I now need to use the JSON Object deserializer that reads from a stream, but because of this bloated overhead data this will fail.

So I need to remove this "bloat".

One option is for me to simply rewrite the file, exluding the bloat, then use the new file. However the file is > 1.5GB now and this will add allot of overhead.

A second option is possibly creating an inherited FileStream class that that can hide this bloat, effectively removing the bad overhead data from the stream, while still streaming the remaining data to the JSON deserializer (this seems complex and annoying).

Is there an easy way to do this I am missing before I undertake one of these sort of annoying options?

Example file data...

HDR ZREOF100B   013 20220129    084455
{
  "CUSTOMER_DATA": [
... allot of JSON data ...
     ]
}
TRL ZREOF100B         551

First and last line is basically the "Bloat" I'm referring to.

DarrenMB
  • 2,342
  • 1
  • 21
  • 26
  • 2
    You could modify [this answer](https://stackoverflow.com/a/48493067/3744182) to [How to deserialize a JSONP response (preferably with JsonTextReader and not a string)?](https://stackoverflow.com/q/48470971/3744182) to skip **until** you encounter a `{`, them ignore trailing garbage as per [Discarding garbage characters after json object with Json.Net](https://stackoverflow.com/q/37172263/3744182). Does that answer your question? Discarding trailing "bloat" is implemented by Json.NET, but it isn't clear from your question exactly how you characterize the leading "bloat". – dbc Feb 01 '22 at 00:23
  • 1
    *First and last line is basically the "Bloat" I'm referring to.* -- If so you could just create a `StreamReader` and do `streamReader.ReadLine()` to trim the leading bloat. before passing it to the [`JsonTextReader` constructor](https://www.newtonsoft.com/json/help/html/M_Newtonsoft_Json_JsonTextReader__ctor.htm), then ignore the trailing bloat as mentioned above. – dbc Feb 01 '22 at 14:24
  • Thank you so much for the ideas. Gonna try a mixed approach by writing spaces over the ending bloat directly in the file (this is easy and fast and whitespace should be ignored by the parser) and then using your "ReadLine" to advance the stream past the first bloat which should allow the JSON object to be loaded via the simple streamreader correctly. – DarrenMB Feb 01 '22 at 15:37
  • 1
    Setting `JsonSerializerSettings.CheckAdditionalContent = false` is surely easier than writing spaces over the end of the file, isn't it? – dbc Feb 01 '22 at 15:56
  • Yes, got it working. Easiest solution was "ReadLine" to skip the start + Serializer.CheckAdditionalContent = false to skip the end. No file changes necessary. Thanks again. don't use JSON much day to day. – DarrenMB Feb 01 '22 at 16:18

1 Answers1

1

As per instructions from @dbc this was my final working object parsing routine for those with a similar issue.

Using fs As New IO.FileStream(fi.FullName, IO.FileMode.Open, IO.FileAccess.ReadWrite, IO.FileShare.Read)
    Dim enc As Text.Encoding = Text.Encoding.GetEncoding(1252) ' this is "Windows-1252" which is for Extended ANSI character set like ASCII but including (128-255) for accented characters.
    Using sr As New IO.StreamReader(fs, enc)
        sr.ReadLine() ' advance past first line of garbage.
        Using jtr As New Json.JsonTextReader(sr)
            Dim ser As New Json.JsonSerializer
            ser.CheckAdditionalContent = False ' should ignore the bloat after the JSON object ends.
            Return ser.Deserialize(jtr)
        End Using
    End Using
End Using
DarrenMB
  • 2,342
  • 1
  • 21
  • 26