-1

I need to find a way to read information out of a very big CSV file with unity. The file is approx. 15000*4000 entries with almost 200MB and could even be longer.

Just using ReadAllLines on the file does kind of work but as soon as I try to do any operation on it, it will crash. Here is the code I am using just counting all non zero values which already crashes it. It's okay if the code might need loading time but it shouldn't crash. I assume it's because I save everything in the memory and therefore flood my RAM? Any ideas how to fix this that it won't crash?

    private void readCSV()
    {
        string[] lines = File.ReadAllLines("Assets/Datasets/testCsv.csv");
        foreach (string line in lines)
        {
            List<string> values = new List<string>();
            values = line.Split(',').ToList();

            int i = 0;
            
            foreach (string val in values)
            {
                if (val != "0")
                {
                    i++;
                }              
            }
        }
    }
Denis
  • 51
  • 6
  • Does it give you an exception? What type of exception? What's the associated message? What's the stack look like? Have you tried doing this line by line, rather than all at once. Have you looked at using a CSV Reader rather than doing it yourself? – Flydog57 Jan 31 '22 at 06:18
  • No there is no exception it kind of worked in a coroutine but it's very slow. I usually have to force kill it with the task manager unity just won't respond. If I do it line by line it would take much longer, right? – Denis Jan 31 '22 at 06:25
  • As I already stated [in your other question](https://stackoverflow.com/questions/70888717/c-sharp-delete-each-row-in-a-big-csv-file-that-contains-a-specific-value-in-a-sp) you should rather go with a streamed solution in order to not load the entire thing into memory. Also FileIO is slow! Use a background thread / async Task for this – derHugo Jan 31 '22 at 07:16

1 Answers1

0

As I already stated in your other question you should rather go with a streamed solution in order to not load the entire thing into memory at all.

Also both FileIO as well as string.Split are slow especially for soany entries! Rather use a background thread / async Task for this!

The next future possible issue in your case 15000*4000 entries means a total of 60000000 cells. Which is still fine. However, the maximum value of int is 2147483647 so if your file grows further it might break / behave unexpected => rather use e.g. uint or directly ulong to avoid that issue.

private async Task<ulong> CountNonZeroEntries()
{
    ulong count = 0;

    // Using a stream reader you can load the content into memory one line at a time
    using(var sr = new StreamReader("Assets/Datasets/testCsv.csv"))
    {
        while(true)
        {
            var line = await sr.ReadLineAsync();

            if(line == null) break;

            var values = line.Split(',');

            foreach(var v in values)
            {
               if(v != "0") count++;
            }
        }
    }

    return count;
}

And then of course you would need to wait for the result e.g. using

// If you declare Start as asnyc Unity automatically calls it asynchronously
private async void Start()
{
    var count = await CountNonZeroEntries();

    Debug.Log($"{count} cells are != \"0\".");
}

The same can be done using Linq a bit easier to write in my eyes

using System.Linq;

...

private Task<ulong> CountNonZeroEntries()
{
    return File.ReadLines("Assets/Datasets/testCsv.csv").Select(line => line.Split(',')).Count(v => v != "0");
}

Also File.ReadLines doesn't load the entire content at once but rather a lazy enumerable so you can use Linq queries on them one by one.

derHugo
  • 83,094
  • 9
  • 75
  • 115