1

What you need to know:

My application uses a database for foods, which exists in a .txt file. Each food has about 170 datas (2-3 digit numbers) which are separated by tabstops, and each food is again separated by \n, so each line in this .txt file has the datas for 1 food.

The applications target platform is Android, it needs to work offline and I use Unity with c# for coding.

My 2 Problems are:

  1. Getting access to the .txt file

As it is not possible for android applications to access a .txt file by

$"{Application.DataPath}/textFileName.txt"

I assigned the .txt file as a TextAsset (name: txtFile) in the Inspector. When the app gets started for the first time I load all the data of the TextAsset file into a json (name: jsonStringList), which contains a List of strings:

for (int i = 0; i < amountOfLinesInTextFile; i++); { jsonStringList.Add(txtFile.text.Split('\n')[i]) }

Technically that does work, but unfortunately the txtFile has a total of about 15000 lines, which makes it really slow (Stopwatch time for the for-loop: ≈750000 ms, which is about 12.5 minutes...)

Obviously it is not an option to let the user wait for that long when opening the app for the first time...

  1. Searching in that jsonList
  • In that app it is possible to make an own food by putting multiple foods together. To do that the user has to search for a food and can then press the result to add it.

  • Currently I check in a for-loop if the input of the user-searchbar InputField (name: searchbar) matches a food of the jsonStringList and if that food is not already displayed.

  • If both is true, I add the name of the food to a List<string> (name: results), which is what I use to display the matching foods. (As the datas (including the name) of the foods are separated by tabstops I use .Split('\t') to get the correct data for the name of the food)

      for (int i = 0; i < amountOfLinesInTextFile; i++)
      {   string name = jsonStringList[i].Split('\t')[nameIndex].ToLower();
          if (name.Equals(searchBar.text.ToLower()) && !results.Contains(name))
          {
              results.Add(name);
          }
      }
    

Again: That technically works, but it is also too slow (even tough it's much faster then problem 1)

(Stopwatch for the for-loop: ≈1600 ms)

I'd be very happy for any help to improve the time of those two actions! Maybe there is a whole different approach for handling such large .txt files, but every bit of decreasing the time would be helpful!

  • 5
    *txtFile.text.Split('\n')[i]* - thats enormous leakage, you split entire file 15000 times just to get one line – eocron May 11 '22 at 17:15
  • How can I replace that? For _File.ReadAllLines_ I would need a path to the .txt file but as the app is for android - and from what I read so far android includes the _.txt_ file in the _.apk_ of the app - I can't access it (or am I missing something?) – user19095510 May 11 '22 at 17:18
  • 1
    https://learn.microsoft.com/en-us/dotnet/api/system.io.stringreader?view=net-6.0 for search use *Dictionary* with *StringComparison.OrdinalIgnoreCase* comparer. – eocron May 11 '22 at 17:19
  • 2
    *or am I missing something?* - you're somehow getting access to the file contents currently; that's good - it's not the slow part. The slow part is how you're treating the file; you're using list for everything and splitting far too much. A massive part of the design of efficient programs is sensible data container choices. Read up on the differences between a List, Dictionary, HashSet, array, LinkedList, Stack and Queue so you're better equipped to choose appropriately. Your `results` should perhaps be a HashSet for example, if it has more than a low number of items – Caius Jard May 11 '22 at 17:25
  • 1
    You need to do `var splitResult = txtFile.text.Split('\n')` ONCE and then do `jsonStringList.Add(splitResult[i])` inside the loop (though I'd say to just iterate over the result of split, and don't index). I don't think that always came across from others. You're reading the *entire* file equal to the number of lines it has. And since you had `amountOfLinesInTextFile` already, it means you already read it at least once for that too! – Kevin Anderson May 11 '22 at 17:32
  • So I tried the first suggestion of @eocron ´s Answer and I already got the loading time down to 500 ms which is really amazing! I will try improving that more with the other suggestions, thank you all!!! (I will comment the final time it takes when I finished this part of the application & I would love to upvote your comments but I don't have enough points yet) – user19095510 May 11 '22 at 18:12

1 Answers1

4

15000 is not a big file, really. You just do too many unnecessary reading/transformations. You need to do it once, cache it (save in variable in your case), reuse it.

var foodIndex = txtFile
  .text
  .Split('\n')                //get rows
  .Select(x=> x.Split('\t'))  //get columns for each row
  .ToDictionary(x=> x[nameIndex], StringComparer.OrdinalIgnoreCase);   //build case-insensitive search index

var myFood = foodIndex["aPpLe"];

This produce Dictionary<string, string[]>

Better approach

Deserialize CSV format (your file is obviously CSV table) into POCO row:

public class Food
{
   [DataMember(Order=1)] //here is your nameIndex
   public string Name {get;set;}
   [DataMember(Order=2)]
   public int Amount {get;set;}
   //...
}

var foodIndex = SomeCSVParse<Food>(txtFile.text)
  .ToDictionary(x=> x.Name, StringComparer.OrdinalIgnoreCase);

var myFood = foodIndex["aPpLe"];

This produce Dictionary<string, Food> search index, which look better, easier to use.

This way all conversion from string to int/double/datetime/etc, order of columns, separators (comma, tab, whitespace), cultures (in case there is float/double), efficient reading, headers, etc can be just ditched to 3rd party framework. Someone did this here - Parsing CSV files in C#, with header

There is also plethora of frameworks on nuget, just pick whatever is smaller/popular or copypaste from sources - https://www.nuget.org/packages?q=CSV

And read more about data structures in C# - https://learn.microsoft.com/en-us/dotnet/standard/collections/

eocron
  • 6,885
  • 1
  • 21
  • 50