1

I am getting a huge json data file of around 5 to 8 GB in size with following data which consist of company info and then array of employee details. In 1 file, only 1 company info is coming and that file size ranges from 5GB - 8GB

I am trying to de serialize this to c# object. I cannot add this whole data to a string as it will throw memory exception.

I am using NewtonSoft Json

Sample data

{
"Companyname": "ABC Company",
"email": "info@abc.com",
"location": "NYC",
"department": [
    {
        "deptid": "15345",
        "deptname":"dept1",
        "projects": ["25A","26B","26C"],

        "employees":
          [
            {
             "empid": "1",
             "name":"john",
             "groupnumber":[234234,34243,343242,2342342]
            },
            {
             "empid": "2",
             "name":"Joseph",
             "groupnumber":[13245646,78945651,45641546,78978979]
            }
          ]
    },
    {
        "deptid": "5654",
        "deptname":"dept2",
        "projects": ["125A","226B","26CD"],

        "employees":
          [
            {
             "empid": "11",
             "name":"Jill",
             "groupnumber":[13224231,123133333,8765433,213132333]
            },
            {
             "empid": "122",
             "name":"Don",
             "groupnumber":[12344,123123234]
            }
          ]
    }   
]}

Class

public class CompanyDetails
{
    public string companyName{ get; set; }
    public string email { get; set; }
    public string location { get; set; }
    public List<Department> department { get; set; }
}

public class Department
{
    public string deptname { get; set; }
    public int deptid{ get; set; }
    public List<Project> projects{ get; set; }
    public List<Employee> employees{ get; set; }
}

public class Project
{
    public string projectReference { get; set; }
}

public class Department
{
    public int empid { get; set; }
    public string name { get; set; }
    public List<GroupNumber> groupnumber { get; set; }
}

public class GroupNumber
{
public long grpnumber { get; set; }
}

Below is my c sharp code. It's not throwing any error. But the companyData object is empty

 using (StreamReader reader = new StreamReader(file.FullName))
           {
              var serializer = new JsonSerializer();
              CompanyDetails companyData = (CompanyDetails)serializer.Deserialize(reader, typeof(CompanyDetails));
           }

Any help is much appreciated.

  • Can't reproduce with the JSON & code shown, see https://dotnetfiddle.net/j39QQl. Are you catching and swallowing exceptions (e.g. an `OutOfMemoryException`) somewhere not shown in your question? – dbc Aug 09 '22 at 04:17
  • That being said your streaming approach is correct; it's what is recommended by the [newtonsoft docs](https://www.newtonsoft.com/json/help/html/Performance.htm#MemoryUsage). If you are running out of memory anyway you may need to process the `employeeDetails` records via a streaming algorithm rather than via deserialization, e.g. as shown in [Deserializing large json from WebService using Json.NET](https://stackoverflow.com/q/67608176/3744182). – dbc Aug 09 '22 at 04:28
  • If there is a chance that some first chance exception is being caught and swallowed you could [enable first chance exceptions](https://stackoverflow.com/q/564681) in visual studio to make sure this is not happening. – dbc Aug 09 '22 at 04:33
  • OK, now I'm getting an exception *`Newtonsoft.Json.JsonSerializationException: Error converting value "25A" to type 'Project'. Path 'department[0].projects[0]', line 9, position 26.`*, see https://dotnetfiddle.net/8SZjZp. Are you getting that also? Also you have `Department` defined twice, I am assuming that the second definition is actually for `Employee`. – dbc Aug 09 '22 at 04:35
  • 1
    Do you possibly have more department entries in the file than a list can store? That would throw an OOM. If you can provide the full exception and stack trace, it should hlep here. See [this question](https://stackoverflow.com/questions/13520956/i-hit-an-outofmemoryexception-with-liststring-is-this-the-limit-or-am-i-miss) for more info on that. Anyway, the solution will be to go lower level and use `JsonTextReader` to read the individual parts of the file instead. [Docs](https://www.newtonsoft.com/json/help/html/readjsonwithjsontextreader.htm) – ProgrammingLlama Aug 09 '22 at 04:48
  • If speed is more important that anything else; https://github.com/EgorBo/SimdJsonSharp – Jeremy Lakeman Aug 09 '22 at 05:02
  • 1
    Which bit of the JSON is the part that gets really big? Is it the root `department` array or is it one of the inner properties? – Charlieface Aug 09 '22 at 10:53

1 Answers1

0

You have to fix the classes. For example

 public List<Project> projects{ get; set; }

should be

public List<string> projects { get; set; }

I've been using these classes and everything is working properly


public class CompanyDetails
{
    public string Companyname { get; set; }
    public string email { get; set; }
    public string location { get; set; }
    public List<Department> department { get; set; }
}

public class Department
{
    public string deptid { get; set; }
    public string deptname { get; set; }
    public List<string> projects { get; set; }
    public List<Employee> employees { get; set; }
}

public class Employee
{
    public string empid { get; set; }
    public string name { get; set; }
    public List<int> groupnumber { get; set; }
}
Serge
  • 40,935
  • 4
  • 18
  • 45