0

I'm trying to process a 25GB GeoJSON file using GeoJSON.net

The accepted answer here works on a small test file but is causing Memory Exception errors with a large 25GB file

There isn't a huge amount of info about how to process the FeatureCollection so I'm just looping through

Can anyone advise what I'm doing wrong?

CODE

    try
    {
        

        JsonSerializer serializer = new JsonSerializer();

        using (FileStream s = File.Open(jsonFile, FileMode.Open))
        using (StreamReader sr = new StreamReader(s, Encoding.UTF8))
        using (JsonReader reader = new JsonTextReader(sr))
        {

            while (reader.Read())
            {
                // deserialize only when there's "{" character in the stream
                if (reader.TokenType == JsonToken.StartObject)
                {
                    FeatureCollection FC = serializer.Deserialize<FeatureCollection>(reader);
                    // Errors Here
                    foreach (var Feature in FC.Features)
                    {
                        if (Feature.Properties.ContainsKey("place"))
                        {
                            foreach (var p in Feature.Properties)
                            {
                                var Key = p.Key;
                                var Value = p.Value;
                                Console.WriteLine("Tags K: {0} Value: {1} ", Key, Value);
                            }
                        }
                    }

                }
            }
        }
    }
    catch (Exception e)
    {
        Console.WriteLine("Err: " + e.Message);
        
    }

The second answer on the same page isn't doing anything BTW JsonReaderExtensions is in a separate file copied from that page

        Regex regex = new Regex(@"^\[\d+\]$");


            using (FileStream s = File.Open(jsonFile, FileMode.Open))
            using (StreamReader sr = new StreamReader(s, Encoding.UTF8))
            using (JsonReader reader = new JsonTextReader(sr))
            {
                IEnumerable<FeatureCollection> objects = reader.SelectTokensWithRegex<FeatureCollection>(regex);

                foreach (var Feature in objects)
                {
                    Console.WriteLine("Hello");
                    // Doesn't get here
                }

Update:

I think the problem is with GeoJSON.net not Newtonsoft.Json as I've used the same method above to open bigger json files using dynamic jsonFeatures = serializer.Deserialize<ExpandoObject>(reader);

Following the comment by @Martin Costello

I've come up with opening the file using a standard StreamReader line by line then convert the filtered lines back into valid GeoJSON I'm sure there must be a better way to do this?

            string Start = @"{
""type"": ""FeatureCollection"",
""name"": ""points"",
""crs"": { ""type"": ""name"", ""properties"": { ""name"": ""urn:ogc:def:crs:OGC:1.3:CRS84"" } },
""features"": [";

            String End = @"]
}
";
            try
            {
                string line;
                StreamReader INfile = new StreamReader(jsonFile);
                while ((line = INfile.ReadLine()) != null)
                {
                        // for debugging
                    CurrentLine = line;
                      // Filter only the line with "place"
                    if (Regex.Match(line, @"\bplace\b", RegexOptions.IgnoreCase).Success)
                    {
                        // rebuild to valid GeoJSON 
                        string json = Start + line + End;
                        FeatureCollection FC = JsonConvert.DeserializeObject<FeatureCollection>(json);
                        foreach (var Feature in FC.Features)
                        {
                            foreach (var p in Feature.Properties)
                            {
                                var Key = p.Key;
                                var Value = p.Value;
                                switch (Key.ToLower())
                                {
                                    //  Do Stuff
                                }
                            }
                        }
                    }
                }
                        }
            catch (Exception e)
            {
                Console.OutputEncoding = System.Text.Encoding.UTF8;
                Console.WriteLine("Err: " + e.Message + "\n" + CurrentLine);
                
                
            }
Holly
  • 307
  • 1
  • 8
  • 17
  • 1
    The first code snippet is just finding the start of the entire object in the document and then trying to read it all, hence the OOM Exception. Consider trying to use `System.Text.Json` to deserialize itself as I _believe_ that better supports streaming scenarios, but I think either way you're going to need to manually iterate through the file more than just checking for an initial `{`, otherwise you'll be parsing too much of the file at once, leading back to an OOM exception. – Martin Costello Jun 10 '21 at 08:50
  • I only need the objects that `ContainsKey("place")` is there a way of filtering just those objects while streaming the file? Just run a test I can use a standard StreamReader on this file and append to output file `if (line.Contains("place") )` That's not a good idea as it could corrupt any diacritics – Holly Jun 10 '21 at 10:34

0 Answers0