0

I'm using a web application to display geojson on Google Maps. I don't wan't to render everything so I'm using an algorithm to only load specific portions of the geojson. The files are fairly large ranging from 5 to 84 MB with a total size of 227 MB.

[HttpGet]
public async Task<IHttpActionResult> GetCurrentElectoralDistrict([FromUri] Point point)
{
    var task1 = GetCurrentFeature(point, "geojson1");
    var task2 = GetCurrentFeature(point, "geojson2");
    var task3 = GetCurrentFeature(point, "geojson3");
    var task4 = GetCurrentFeature(point, "geojson3");
    var task5 = GetCurrentFeature(point, "geojson3");

    await Task.WhenAll(task1, task2, task3, task4, task5);

    var result1 = await task1;
    var result2 = await task2;
    var result3 = await task3;
    var result4 = await task4;
    var result5 = await task5;

    //Modify features

}

private async Task<GeoJSON.Net.Feature.Feature> GetCurrentFeature(Point point, string geoJsonFileName)
{
    var path = HostingEnvironment.MapPath("~/");

    using (var reader = File.OpenText($"{path}\\GIS\\{geoJsonFileName}.geojson"))
    {
        //The only used await in the method. Therefore the method will run 
        //synchronously if await is removed and ReadToEnd is used.
        var json = await reader.ReadToEndAsync();
        //Handle json
    }
    //Same result using BufferedStream synchronously
    //using (FileStream fs = File.Open($"{path}\\GIS\\{geoJsonFileName}.geojson", FileMode.Open, FileAccess.Read, FileShare.Read))
    //using (BufferedStream bs = new BufferedStream(fs))
    //using (StreamReader sr = new StreamReader(bs))
    //{
    //    var json = await reader.ReadToEndAsync();
}

This method is pretty slow due to the many and large files and since they are not dependent on each other I tried to get the information asynchronously. However when using ReadToEndAsync it is nearly 2,5 times slower than ReadToEnd. I can shave some time of using BufferedStream but it is still slower than just running the code synchronously. Why is this and how can I make it perform better?

Get request in seconds:
+----------------+----------------+------------------------------------+
|   ReadToEnd    | ReadToEndAsync | ReadToEndAsync with BufferedStream |
+----------------+----------------+------------------------------------|   
| 20.84          | 52.60          | 29.65                              |
| 19.87          | 51.03          | 29.64                              |
| 20.51          | 49.69          | 29.42                              |
+----------------+----------------+------------------------------------+
Ogglas
  • 62,132
  • 37
  • 328
  • 418
  • 3
    Your hard drive only has one disk head on it (unless this data is stored on multiple different drives). Asking for 5 different things at the same time isn't going to get the work done any faster than asking for one item than the next, and so on. – Servy Mar 08 '18 at 20:05
  • @Servy True but why is the async method so much slower? Is it really that much overhead? Is there anyway that I can load these files into RAM and access them directly in any good way? – Ogglas Mar 08 '18 at 20:07
  • If 5 of your co workers had a question for you do you think you'd answer their questions faster if they all came to your desk at once and started asking their questions at the same time, or if they took turns and waited for the previous person to finish before asking their question? Trying to parallelize an operation that inherently needs to be done sequentially is *always* going to be slower. If you want to sequentially load all of the files into memory, and then process them in parallel, then sure, just separate the code to read the files from the code that processes them. – Servy Mar 08 '18 at 20:11
  • Depending on what `// handle json` does, you may have better results overall with a streaming solution that reads from the file, manipulates what it has on the fly and then streams it out to Maps, without first doing the expensive allocations involved with translating a bunch of text into a pile of objects. Of course, this does require that your algorithm requires a minimum amount of backtracking. – Jeroen Mostert Mar 08 '18 at 20:12
  • Imagine what that disk head should do in your parallel version. It should basically move to file1 location, read small portion, then move to file2 (which might be far away), read small portion, and so on. So it basically moves back and forth between 5 files, reading small chunks. Sequential version need to do big moves (between files) only 5 times, the rest is sequential movement. – Evk Mar 08 '18 at 20:26

0 Answers0