1

I'm building a program. It uses data stored in text files in following format {Index:Value}. Example:

1:10,86;
2:11,65;
3:13,32;
4:13,53;
5:13,93;
...
1500:1565,99;

There is rougly 1490-1500 lines in each file.

I need those files read in a dictionary and access its Index as integer and Value as double both by different functions. Something like:

Data.ByIndex((integer) 3) - returns (double) 13,32

Now getting Index by Value would be little more tricky:

Data.ByClosestValue((double) 12,0) - returns (integer) 2
Data.ByClosestValue((double) 13) - returns (integer) 3
Data.ByClosestLower((double) 13,5) - returns (integer) 3
Data.ByClosestHigher((double) 13,5) - returns (integer) 5

There are few key moments:

  1. It has to be really fast. There is usually 10-15 data files read in the same time and the dictionary accessed several times for each file.
  2. No LINQ if possible.

For now I went with following:

  1. File.ReadAllText() method seems the fastest to me.
  2. Getting entries by using Split(';') and the Split(':').
  3. Dictionary <string, string>. The reason I went with string types is that if will be faster to read from text data by using Split() function.

What will be the optimal solution?

Update: most people would suggest storing data in a database and I agree that would be the best solution, but unfortunately I have no control over those data files. These data files are formed in another program and there is a strict requirement it must be editable by human via notepad or whatever.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Technical
  • 145
  • 1
  • 11
  • Is there a reason you dont want to use a database? – maccettura Jun 29 '17 at 14:57
  • 1
    could you elaborate why you don't want to use linq? – user1859022 Jun 29 '17 at 14:58
  • 1. Database isn't an option because data files are formed in a different application. – Technical Jun 29 '17 at 15:01
  • 2. The reason I don't want to use LINQ is because I need my program to work on .NET 2.0. – Technical Jun 29 '17 at 15:02
  • I don't understand this part: `and the dictionary accessed several times for each file`. Why? You should load dictionaries into memory at the start of application, to be able to use any time. – Maciej Los Jun 29 '17 at 15:03
  • for the second lookup (by closest lower/higher), [look into this](https://stackoverflow.com/questions/12412869/efficiently-find-nearest-dictionary-key) and use the double or rounded to int as the key, and a sublist for the corresponding indices/subset (if the same double can occur under multiple indices). – Cee McSharpface Jun 29 '17 at 15:04
  • Sorry, I meant that there will be several Data.ByIndex(), Data.ByClosestValue(), Data.ByClosestLower() and Data.ByClosestHigher() callouts for each dictionary. After that a new dictionary for the next data file must be formed. – Technical Jun 29 '17 at 15:06
  • 4
    Don't store strings if you always want to calculate with the values, split and parse it once and then store it as the correct type in the dictionary. – Tim Schmelter Jun 29 '17 at 15:08
  • .net Framwork 3.5 that uses CLR 2.0 is available from WindowsXP onwards - what platform are you developing on? – user1859022 Jun 29 '17 at 15:14
  • There is a requirement that my program must run on a wide range of Windows - from XP SP1 to 7. – Technical Jun 29 '17 at 15:21
  • Insted File.ReadAllText() use File.ReadAllLines... will 2 cents in perf – dipak Jun 29 '17 at 15:26
  • You know what - sometimes our job as a programmer is to let the "requirements" people know when their requirements are dumb! – Jamiec Jun 29 '17 at 15:26
  • Just guessing: 13,32 does not fit culture invariant conversion. Which culture do you use for conversion from string? – joe Jun 29 '17 at 15:46
  • I hope you charge extra for trying to support unsupported platforms (XP)? – Hans Kesting Jun 29 '17 at 15:50

1 Answers1

1

I have a strong believe that reading files to memory, parsing values to int and double (so you can store them to IDictionary<int, double> before usage will be the most effective solution in this case. You can use SortedDictionary to have it more effective. Your ByIndex() function will be trivial:

double ByIndex(int index)
{
  double value = 0.0;
  Data.TryGet(index, out value);
  return value;
}

Other functions:

int ByClosestValue(double val)
{
  int closest = -1;
  foreach(var v in Data)
  { 
    if (Math.Round(v.Value, 0) == Math.Round(val, 0) 
    {
      closest =v.Key; 
      break;
    }
  }
  return closest;
}

ByClosestLower and ByClosesHigher are almost the same - you only need to call Floor() and Ceiling() instead of Round(). You can use also this: Efficiently find nearest dictionary key to your advantage

Roman Ananyev
  • 519
  • 1
  • 5
  • 16
  • Why use SortedDictionary? Isn't it the slowest of Dictionaries? – Technical Jun 29 '17 at 15:50
  • It could be slower when you insert values (well in your case 1500 * 15 = 22500 entries - i doubt that you would notice), but faster to access them (well i suppose your task is to access values way more frequent than insert them) – Roman Ananyev Jun 29 '17 at 15:58