2

Using c#:

I have a few hundreds of JSON files in nested folders in file system. I need to run LINQ queries within the data in files and find the JSON files that their JSON data matches certain crieria.

I can simly serialize all the JSON files in a List, then run my LINQ query on the array. However the approach takes lots of memory since I am reading all data from disk.

Is there any way to run my LINQ query on JSON files in file system without loading all of them in memory?

Allan Xu
  • 7,998
  • 11
  • 51
  • 122
  • Do you need to process them all at once? – Jonathon Chase Jan 18 '19 at 02:05
  • I jsut need to run query on them. For example give me list of productName with create date in Aug 2016. – Allan Xu Jan 18 '19 at 03:54
  • It is somewhat hard to write code that behaves the way you describe - normally one would `foreach(var item in GetAllItemsWithLinq()){ item...}` which does not need to load more than one item at a time... Some clarification why such approach does not work in your case may narrow down the question. – Alexei Levenkov Jan 18 '19 at 03:58

2 Answers2

2

You should be able to stream the data as described in the following posts or something similar. This should help with the memory issues. How to parse huge JSON file as stream in Json.NET?, Parsing large json file in .NET

verbal
  • 151
  • 1
  • 4
  • I think my senario is diffrent. I have 900 small files. You are refering to one large file. – Allan Xu Jan 18 '19 at 03:56
  • You could still read each file/object individually, run your validation/analysis on it, then put whatever information into a separate List that you're tracking. At least that's how I was thinking of it. Doing it this way doesn't allow you to run a LINQ statement against ALL the objects at once, but you wouldn't have to put it all into memory at once. – verbal Jan 18 '19 at 18:04
  • yes, and this is something that @Alexei Levenkov suggested – Allan Xu Jan 18 '19 at 19:47
0

Ok noSql will not work for you, but here i found a diffrent solution that you could use.

Insert the files in sql db then you could simply do a select stats on them.

Here is an one way of doing it

-- Load file contents into a variable
SELECT @json = BulkColumn
 FROM OPENROWSET (BULK 'C:\JSON\Books\book.json', SINGLE_CLOB) as j

-- Load file contents into a table 
SELECT BulkColumn
 INTO #temp 
 FROM OPENROWSET (BULK 'C:\JSON\Books\book.json', SINGLE_CLOB) as j

And using Json_Value to read

SELECT FirstName, LastName,
JSON_VALUE(jsonInfo,'$.info.address[0].town') AS Town
FROM #temp
WHERE JSON_VALUE(jsonInfo,'$.info.address[0].state') LIKE 'US%'
ORDER BY JSON_VALUE(jsonInfo,'$.info.address[0].town')

Here is how to import json files

https://learn.microsoft.com/en-us/sql/relational-databases/json/import-json-documents-into-sql-server?view=sql-server-2017

And here is how to do a where sats in them.

https://learn.microsoft.com/en-us/sql/t-sql/functions/json-value-transact-sql?view=sql-server-2017

Alen.Toma
  • 4,684
  • 2
  • 14
  • 31
  • That is diffrent application architecture. For my requirments, the JSON files need to stay in file system along with other files. – Allan Xu Jan 18 '19 at 03:52
  • Do you know any no sql library that I can have a DB consistes of JSON files scattered on file system? – Allan Xu Jan 18 '19 at 03:53
  • Updated my answer, read it and let me know if it work for you. its not exactly linq – Alen.Toma Jan 18 '19 at 13:45
  • Thank you for the update. The problem is that this completely changes the problem definition. I wanted to find if there is a good technique to do this in C# and you are adding a relatively big component (SQL Server) to the solution. – Allan Xu Jan 18 '19 at 19:43
  • not sql server but sqllite will also work https://www.sqlite.org/json1.html. and here is also EntityWorker.Core it support sqllite. and here https://sqlitebiter.readthedocs.io/en/latest/pages/usage/file/ you can import json file into sqllite – Alen.Toma Jan 19 '19 at 00:24