Creating dictionary from text file and then accessing it

Question

I'm building a program. It uses data stored in text files in following format {Index:Value}. Example:

1:10,86;
2:11,65;
3:13,32;
4:13,53;
5:13,93;
...
1500:1565,99;

There is rougly 1490-1500 lines in each file.

I need those files read in a dictionary and access its Index as integer and Value as double both by different functions. Something like:

Data.ByIndex((integer) 3) - returns (double) 13,32

Now getting Index by Value would be little more tricky:

Data.ByClosestValue((double) 12,0) - returns (integer) 2
Data.ByClosestValue((double) 13) - returns (integer) 3
Data.ByClosestLower((double) 13,5) - returns (integer) 3
Data.ByClosestHigher((double) 13,5) - returns (integer) 5

There are few key moments:

It has to be really fast. There is usually 10-15 data files read in the same time and the dictionary accessed several times for each file.
No LINQ if possible.

For now I went with following:

File.ReadAllText() method seems the fastest to me.
Getting entries by using Split(';') and the Split(':').
Dictionary <string, string>. The reason I went with string types is that if will be faster to read from text data by using Split() function.

What will be the optimal solution?

Update: most people would suggest storing data in a database and I agree that would be the best solution, but unfortunately I have no control over those data files. These data files are formed in another program and there is a strict requirement it must be editable by human via notepad or whatever.

1. Database isn't an option because data files are formed in a different application. — Technical, Jun 29 '17 at 15:01
2. The reason I don't want to use LINQ is because I need my program to work on .NET 2.0. — Technical, Jun 29 '17 at 15:02
I don't understand this part: `and the dictionary accessed several times for each file`. Why? You should load dictionaries into memory at the start of application, to be able to use any time. — Maciej Los, Jun 29 '17 at 15:03
for the second lookup (by closest lower/higher), [look into this](https://stackoverflow.com/questions/12412869/efficiently-find-nearest-dictionary-key) and use the double or rounded to int as the key, and a sublist for the corresponding indices/subset (if the same double can occur under multiple indices). — Cee McSharpface, Jun 29 '17 at 15:04
Sorry, I meant that there will be several Data.ByIndex(), Data.ByClosestValue(), Data.ByClosestLower() and Data.ByClosestHigher() callouts for each dictionary. After that a new dictionary for the next data file must be formed. — Technical, Jun 29 '17 at 15:06
Don't store strings if you always want to calculate with the values, split and parse it once and then store it as the correct type in the dictionary. — Tim Schmelter, Jun 29 '17 at 15:08
.net Framwork 3.5 that uses CLR 2.0 is available from WindowsXP onwards - what platform are you developing on? — user1859022, Jun 29 '17 at 15:14
There is a requirement that my program must run on a wide range of Windows - from XP SP1 to 7. — Technical, Jun 29 '17 at 15:21
Insted File.ReadAllText() use File.ReadAllLines... will 2 cents in perf — dipak, Jun 29 '17 at 15:26
You know what - sometimes our job as a programmer is to let the "requirements" people know when their requirements are dumb! — Jamiec, Jun 29 '17 at 15:26
Just guessing: 13,32 does not fit culture invariant conversion. Which culture do you use for conversion from string? — joe, Jun 29 '17 at 15:46
I hope you charge extra for trying to support unsupported platforms (XP)? — Hans Keﬆing, Jun 29 '17 at 15:50

score 1 · Accepted Answer · answered Jun 29 '17 at 15:38

I have a strong believe that reading files to memory, parsing values to int and double (so you can store them to IDictionary<int, double> before usage will be the most effective solution in this case. You can use SortedDictionary to have it more effective. Your ByIndex() function will be trivial:

double ByIndex(int index)
{
  double value = 0.0;
  Data.TryGet(index, out value);
  return value;
}

Other functions:

int ByClosestValue(double val)
{
  int closest = -1;
  foreach(var v in Data)
  { 
    if (Math.Round(v.Value, 0) == Math.Round(val, 0) 
    {
      closest =v.Key; 
      break;
    }
  }
  return closest;
}

ByClosestLower and ByClosesHigher are almost the same - you only need to call Floor() and Ceiling() instead of Round(). You can use also this: Efficiently find nearest dictionary key to your advantage

Why use SortedDictionary? Isn't it the slowest of Dictionaries? — Technical, Jun 29 '17 at 15:50
It could be slower when you insert values (well in your case 1500 * 15 = 22500 entries - i doubt that you would notice), but faster to access them (well i suppose your task is to access values way more frequent than insert them) — Roman Ananyev, Jun 29 '17 at 15:58

Creating dictionary from text file and then accessing it

1 Answers1