0

I'm creating a console application that will convert a big json file (around 900mb) into a different format (will be output into a new json file).

My objective is to make sure the output file doesnt exceed 10mb. I think I need to check the size of the JObject before output it into a json file. If the JObject exceeded 10mb, the current object will not be added into the JObject and the JObject (< 10mb) will be written into a json file, the subsequent objects will then be added into a new JObject until it reaches the limit of 10mb again.

So, the question is how can I determine the size (not length/count) of the JObject, I tried convert it into byte array but when I sum all the bytes and divide by (1024x1024) but it doesn't work.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • 2
    Size in memory correlates loosely with size on disk and will vary depending on where your code is running and how the storage is organised. The same JSON stored in some cloud block storage will have a different size to that stored in a mirrored RAID array on a Windows server. You'd probably have more success just arbitrarily breaking the JSON into batches based on some pre-calculated and estimated size based on something you can count. – Jodrell Aug 10 '22 at 09:50
  • When you consider other factors like encoding, at multiple levels, it soon gets vary hard to work out. A Monte-Carlo approach of storing some examples with known sizes and seeing how much space they take will give you a better handle on the batch size you require. – Jodrell Aug 10 '22 at 09:52
  • As for getting the size in memory, https://stackoverflow.com/a/207605/659190 – Jodrell Aug 10 '22 at 10:10
  • @Jodrell In .NET's case, it will require _double_ the memory of a UTF-8 JSON file because .NET represents strings in-memory using UTF-16, and certain JSON libraries (such as `System.Text.Json`) represent the entire JSON file as a single in-memory string and use cheap substrings (`ReadOnlyMemory` and `ReadOnlySpan`) to represent values-within. So I think the OP will likely hit .NET's 2GB object size limit (even on x64) - so what they're doing might be impossible _using that approach_. – Dai Aug 10 '22 at 11:20
  • @Dai The OP refers to `JObject` which is JSON.Net. Some JSON that is ~10mb when persisted is unlikely to be 2GB in memory. However, 900mb may be harder, some sort of Stream/Pipeline approach is likely best. – Jodrell Aug 10 '22 at 12:13
  • *I tried convert it into byte array but when I sum all the bytes and divide by (1024x1024) but it doesn't work.* -- then please [edit] your question to share a [mcve], specifically the code that generates the incorrect size. – dbc Dec 06 '22 at 17:51

0 Answers0