2

I'm working on a very small scale program mostly working on using word initials to search for full strings and extra information attached to it, including in-program CRUD operations. Since there are only ~10k strings to be searched, I would prefer to use in-memory approach to quicken searches (since most of the queries is SELECT).

Is it a good approach to simply use things like List<myobject> data and do something like this: data.Where((s => s.text.Substring(0,3) == expected));? Are there anything I can do regarding database optimization (like indexing) when I use this method?

xxbidiao
  • 834
  • 5
  • 14
  • 27

4 Answers4

3

TL;DR
I suggest evaluating your requirements. If using a simple List<> meets your requirements and you are happy with performance results of profiling your application then use it. If not consider using an in-memory database.


One of the main reason database are fast enough to retrieve data is Index. When you use common C# data structures you miss this feature. I think if you are tackling with small number of records then you would not face performance problem but if you have many records then you should consider using in memory databases. Keep in mind that the chosen database may not support the way you want to index data or you have complex queries which indexing does not improve their performance.

If you just have keys and corresponding values then take a look at Redis and StackExchange.Redis.

Another thing to consider is concurrency! Databases usually support accessing data from multiple threads and handle multiple reader or writer on same data. You can use thread-safe collections in .NET but you have to do a lot to get features which are built in databases.

Community
  • 1
  • 1
Hamid Pourjam
  • 20,441
  • 9
  • 58
  • 74
  • Too bad to hear that Indexing is not available. But let's see to what extent these internal structures can do in this specific problem. – xxbidiao Apr 28 '16 at 06:04
2

If you are only doing prefix searches (as in the question) you can probably get away with using a list, as long as you keep it sorted and do a binary search instead of a linear one (which is what where will do).

A dictionary, while much faster if you are doing exact matches, isn't the right tool here because you want to do searches which will still be O(N).

If you want to use LINQ you are probably better off just using EF and SQL Server CE. This is a pretty painless option, though you are obviously adding some significant dependencies.

If you want to roll your own solution in C# that will work pretty much the same way a database would the datastructure you are looking for is called a Trie[1] This still won't give you LINQ (unless you write a bunch more code) but will give you good search performance.

[1]https://en.wikipedia.org/wiki/Trie

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Yaur
  • 7,333
  • 1
  • 25
  • 36
  • So do I need to tell the framework anything if I would like `where` to do binary search? Since binary searches are only good at sorted arrays. – xxbidiao Apr 28 '16 at 06:13
  • 1
    @xxbidiao no. you are going to want to use `List.BinarySearch` instead of LINQ – Yaur Apr 28 '16 at 06:16
  • I wonder whether `SortedList` do this job well (Can use `where` to do binary search). – xxbidiao Apr 28 '16 at 06:48
  • I don't really have any experience with `SortedList` and can't find anything regarding the cyclomatic complexity of linq vs that collection. You might want to profile vs the BinarySearch method. – Yaur Apr 29 '16 at 17:47
1

If you are using data structure so it is better that you will use data structure with complexity of o(1) like dictionary and hash table and not list that had more slower complexity. Be aware that list is taking you some of your free memory.

The scion
  • 1,001
  • 9
  • 19
1

Check this test:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace CSharpConsoleApplication.Tests
{
    class JustATest
    {
        public static void Run()
        {
            var list = new List<Test>();

            for (int i = 0; i < 1000000; i++)
                list.Add(new Test() { Text = "a" + i.ToString().PadLeft(6, '0') });


            string input = "a011";
            List<Test> found = null;




            // Get the results with LinQ

            var w = new Stopwatch(); w.Start();
            found = list.Where(t => t.Text.Substring(0, 4) == input).ToList();
            w.Stop();
            Console.WriteLine("Search list with linq. Results count = {0}", found.Count);
            Console.WriteLine(w.Elapsed);
            Console.ReadLine();




            // Store data in dictionary if no refresh needed

            // Populate the dictionary
            var objectsDictionary = new Dictionary<string, List<Test>>();

            w.Restart();
            PopulateDictionary(objectsDictionary, list, input.Length);
            w.Stop();
            Console.WriteLine("Populate dictionary");
            Console.WriteLine(w.Elapsed);
            Console.ReadLine();

            // Search in dictionary
            w.Restart();
            if (objectsDictionary.ContainsKey(input))
                found = objectsDictionary[input];
            //objectsDictionary[input].ForEach(t => Console.WriteLine(t.Text));

            w.Stop();
            Console.WriteLine("Search in dictionary. Results count = {0}", found.Count);
            Console.WriteLine(w.Elapsed);
            Console.ReadLine();
        }

        static void PopulateDictionary(Dictionary<string, List<Test>> dictionary, List<Test> list, int textLength)
        {
            foreach (var t in list)
            {
                string text = t.Text.Substring(0, textLength);

                if (dictionary.ContainsKey(text))
                    dictionary[text].Add(t);
                else
                    dictionary.Add(text, new List<Test>() { t });
            }
        }

        class Test
        {
            public string Text { get; set; }
        }

    }
}
shadow
  • 1,883
  • 1
  • 16
  • 24
  • Interesting. If only searching for fixed length, dictionary seems to be faster than list just a tiny bit. Actually it's a good idea to have duplicated data if data itself is small. Output on an online c# runner is:`Search list with linq. Results count = 1000 00:00:00.2860005 Populate dictionary 00:00:00.2699203 Search in dictionary. Results count = 1000 00:00:00.0000888` – xxbidiao Apr 28 '16 at 07:01
  • I'm not sure if that will help. My though was that if your data do not change frequently, storing them in a dictionary (with some overhead of course) using as key the search criteria, would produce much faster search results. – shadow Apr 28 '16 at 07:10