2

I have very large JSON files that I would like to explore the structure of before processing. Some of these can be in excess of 4-5 GB, but the one I've picked a smaller one just for exploration purposes. The bonus difficulty here is that the files were minified, so this data is on a single line.

I am attempting to do this with Get-Content in PowerShell 7 (x64), but am getting this error:

Get-Content: Exception of type 'System.OutOfMemoryException' was thrown.

My MaxMemoryPerShellMB is set to the Windows 11 default of 2147483647. If I monitor Task Manager while this is running, the pwsh process hits 3,599 GB then throws the exception. For fun, I also lowered the MaxMemoryPerShellMB to 8096 to put it within the boundaries of my machine (32 GB), but that obviously had no effect.

Any thoughts on why 64-bit PowerShell is throwing a max memory exception at the 32-bit limit? And is there a better method for loading this much data with PowerShell or is it just impossible?

The exact size of this file is 1,832,252,369 bytes.

  • 3
    Strings in .NET can't exceed 2GB, and each character is stored as two bytes, so you'll hit a hard limit after reading ~1 billion characters. – Mathias R. Jessen Jul 13 '22 at 20:35
  • 3
    Using the JSonSerializer Class and a StreamReader might be a good entry point (this is available in .NET Core, hence works in Pwsh 7). [This answer](https://stackoverflow.com/a/43747641/15339544) shows you how however not sure if it will work with a Json that big – Santiago Squarzon Jul 13 '22 at 20:38
  • @Santiago, I feel like closing this one as it is actually a duplicate of the question/answer your refer to. Any reasons why this is not the case? – iRon Jul 14 '22 at 05:38
  • @iRon because it's c# not PowerShell. Not sure if there is a PowerShell answer showing this – Santiago Squarzon Jul 14 '22 at 11:53
  • 2
    The error shouldn't occur in `Get-Content` as this PowerShell cmdlet should stream by itself and hardly use any memory if correctly used. Meaning do not assign the result to a variable or use brackets but directly pipe it to the next cmdlet: `Get-Content .\Test.json |ConvertFrom-Json`. I expect that you will now get an error in the `ConvertFrom-Json` cmdlet. In PowerShell 7, try using the `-AsHashTable` Parameter. if it still fails, please show a part of the top structure (of a small example) in your question: use e.g.: `Get-Content .\Test.json |ConvertFrom-Json |ConvertTo-Json -Depth 1` – iRon Jul 14 '22 at 12:30
  • See also: [Iterate though huge JSON in powershell](https://stackoverflow.com/a/59429757/1701026) – iRon Jul 14 '22 at 12:36
  • 1
    Use `[IO.File]::ReadLines('path\to\json.json')` instead of `Get-Content` for @iRon's test – Santiago Squarzon Jul 14 '22 at 13:23
  • Some of the files I'd like to peek into are over 200gb, but wanted to experiment with the data structure a bit. `[IO.File]::ReadLines` worked, though `ConvertTo-Json` grinds gears for a while and eventually capped out memory again. The overhead for objects must be huge, combined with this data structure. – HallucinationOrbit Jul 15 '22 at 06:08

0 Answers0