0

I've got a search box that I'm providing autocomplete suggestions for but it's really slow, it takes multiple seconds for suggestions to appear. I'm pretty sure my code is inefficient but I'm not sure the best way to improve it, any suggestions?

[HttpPost]
    [Route("search")]
    public virtual JsonResult Search(string term)
    {
        var result = new List<SearchResult>();

        if (!String.IsNullOrWhiteSpace(term))
        {
            var searchTerms = term.ToLower().Split(' ');
            List<Card> resultList = null;
            foreach (var query in searchTerms)
            {
                if (resultList == null)
                {
                    resultList = CardRepository.FindAll().Where(x => x.Name.ToLower().Contains(query) || x.Set.SetName.ToLower().Contains(query) || x.Variant.ToLower().Contains(query) 
                    || x.CardNumber.ToLower().Contains(query) || (query == "holo" && x.IsHolo)).ToList();
                }
                else
                {
                    resultList = resultList.Where(x => x.Name.ToLower().Contains(query) || x.Set.SetName.ToLower().Contains(query) || x.Variant.ToLower().Contains(query) 
                    || x.CardNumber.ToLower().Contains(query) || (query == "holo" && x.IsHolo)).ToList();
                }
            }

            foreach (var item in resultList.Take(10))
            {
                result.Add(new SearchResult()
                {
                    label = item.FullCardName,
                    value = item.CardId.ToString()
                });
            }
        }

        return Json(result);
    }

EDIT: Added the FindAll() code.

 private readonly IDatabase _database;
 public IQueryable<Card> FindAll()
 {
     return _database.CardDataSource.OrderBy(a => a.Name).AsQueryable();
 }

SOLUTION: Going on the advice from the comments and with reference to this post Full Text Search with LINQ I moved my searching to the repository as a method and the result is almost instant autocomplete suggestions. I'm not sure how much better I could make the performance but it's easily usable in its current state.

public Card[] Search(string[] searchTerms)
{
    IQueryable<Card> cardQuery = _database.CardDataSource;
    foreach(var term in searchTerms)
    {
        var currentTerm = term.Trim();
        cardQuery = cardQuery.Where(p => (p.Name.Contains(currentTerm) ||
                                            p.Variant.Contains(currentTerm) ||
                                            p.CardNumber.Contains(currentTerm) ||
                                            p.Set.SetName.Contains(currentTerm) ||
                                            (term == "holo" && p.IsHolo) ||
                                            (term == "reverse" && p.IsHolo))
                                        );
    }

    return cardQuery.Take(10).ToArray();
}

[HttpPost]
[Route("search")]
public virtual JsonResult Search(string term)
{
    var result = new List<SearchResult>();

    if (!String.IsNullOrWhiteSpace(term))
    {
        var searchTerms = term.ToLower().Split(' ');
        var resultList = CardRepository.Search(searchTerms);

        foreach (var item in resultList)
        {
            result.Add(new SearchResult()
            {
                label = item.FullCardName,
                value = item.CardId.ToString()
            });
        }
    }

    return Json(result);
}
Corro
  • 21
  • 2
  • Welcome to Stack Overflow. Please read the [tour] and [ask], emphasis on [mcve]. All code that actually does something remotely performance-related is abstracted away into `CardRepository.FindAll()`. – CodeCaster Mar 06 '19 at 09:09
  • 2
    This doesn't seem to be LINQ - it's some backend that uses a fluent interface similar to LINQ. Please add an appropriate tag. – BartoszKP Mar 06 '19 at 09:09
  • 1
    Your repository probably goes to the moon-and-back to load everything up in memory, if you are using EF, ditch the repository and just use Linq to entities – TheGeneral Mar 06 '19 at 09:11
  • 1
    The LINQ to Objects code is overcomplicated too. Lots of calls to `ToLower()` instead of using eg `String.Contains()` with `ignoreCase:true)`. That `Where()` will have to scan all items too, to find a match. And yet, `Take(10)` will only keep the first 10 results, even though it scanned everything – Panagiotis Kanavos Mar 06 '19 at 09:14
  • @MikaelDúiBolinder no, and both are redundant. Both generate temporary strings that need go be garbage-collected. That alone eradicates any performance difference, if there is one – Panagiotis Kanavos Mar 06 '19 at 09:15
  • The query could be rewritten as `Where(x=> String.Contains(x.Name,query,true) || String.Contains(x.Set.SetName,query,true) ||....).Take(10)`. That wouldn't create any temporary strings and stop as soon as 10 results were found – Panagiotis Kanavos Mar 06 '19 at 09:23
  • 1
    @Corro what are you trying to do? It almost looks like you're trying to perform a full-text search. Perhaps you should do just that, and use eg SQL Server's full-text search, or a similar operation depending on the database you use? – Panagiotis Kanavos Mar 06 '19 at 09:47
  • The main flaw is that you don't compose an `IQueryable` first and then execute it as one SQL query. You should start with `IQueryable cards = CardRepository.FindAll();` and then add `cards = cards.Where()` clauses to it. See [here](https://stackoverflow.com/q/5595338/861716) for just one of many examples. Of course this only succeeds if `CardRepository.FindAll()` actually returns a `DbSet`. Too bad you don't show what happens inside that method. If you take this approach you probably don't need these `ToLower` calls either, because the database collation determines the comparison. – Gert Arnold Mar 06 '19 at 09:57
  • @Corro you could also use structures and algorithms suited to searching. For example, you could create tries(prefix trees) from the lookup text which would work like indexes. Instead of scanning all items to find matches you'd only need to perform a few operations. [This Visual Studio Magazine article](https://visualstudiomagazine.com/articles/2015/10/20/text-pattern-search-trie-class-net.aspx) shows how you can construct a trie to speed up substring searches – Panagiotis Kanavos Mar 06 '19 at 09:58
  • @PanagiotisKanavos thanks for the responses, I've added my FindAll() code, see my response to Tom Chantler below for what I'm trying to do (i didn't realise I could only tag one person in a comment). I will try some of the suggestions and see how I go. – Corro Mar 06 '19 at 22:50

2 Answers2

0

I think that the main problem is that you're using .FindAll() which returns a List<T>.

This means that when you say CardRepository.FindAll() it gets all of the records into an in-memory list and then your subsequent refining queries (e.g. Where(x => x.Name.ToLower().Contains(query)) and so on) are all run against the entire list. So it's right that it's returning really slowly.

You could try rewriting it by simply removing the .FindAll() and see what happens.

Please note, I'm just giving you the main problem, there are other issues, but none is as important as this one.

Tom Chantler
  • 14,753
  • 4
  • 48
  • 53
  • Yep. I have just added a note to say that I'm only giving the biggest problem as I don't have time to go into more detail right now. – Tom Chantler Mar 06 '19 at 09:23
  • 1
    Looking at the code, it's *painfully* slow, performing one search or scan per term. The database query could be rewritten to use FTS or join with a TVP while the LINQ part could be modified to use a regex to search for multiple words at once. – Panagiotis Kanavos Mar 06 '19 at 09:45
  • You know what, I thought about that but, because it doesn't mention SQL Server, I wasn't sure I should mention it, in case the backing store is not SQL. I then wondered about generic stemming and lexing and parsing and... well, you know ;-) But it sounds like you and I are on the same page :-) – Tom Chantler Mar 06 '19 at 09:54
  • 2
    Thanks @TomChantler and PanagiotisKanavos .. I've added what my FindAll() code does. Basically I'm trying to give accurate autocomplete suggestions to users as they type into a search box from a table of about 40,000 rows. I do use SQL server, this is just a repository pattern project using LINQ to SQL. I've not used FTS before, I will look into that and also try your other suggestions, thanks :) – Corro Mar 06 '19 at 22:45
-4

You could use multi-threading like this (pseudo-C# code):

var allCards = CardRepository.FindAll().ToArray(); // Ensure array.
query = query.ToUpper();

var nameTask = Task.StartNew(() => allCards.Where(x => x.Name.ToUpper().Contains(query)).ToArray());
var setTask = Task.StartNew(() => allCards.Where(x => x.Set.SetName.ToUpper().Contains(query)).ToArray());
var variantTask = Task.StartNew(() => allCards.Where(x => x.Variant.ToUpper().Contains(query)).ToArray());
var cardNumberTask = Task.StartNew(() => allCards.Where(x => x.CardNumber.ToUpper().Contains(query)).ToArray());
var holoTask = Task.StartNew(() => allCards.Where(x => query == "holo" && x.IsHolo).ToArray());

Task.WaitAll(new Task[] {nameTask, setTask, variantTask, cardNumberTask, holoTask});

var result = (nameTask.Result + setTask.Result + variantTask.Result + cardNumberTask.Result + halaTask.Result).Distinct().ToArray();
Mikael Dúi Bolinder
  • 2,080
  • 2
  • 19
  • 44
  • 2
    Running bad code using multiple threads only *multiplies* the waste. In this case it will accelerate the generation of garbage objects – Panagiotis Kanavos Mar 06 '19 at 09:24
  • How is it bad to run parallell queries? SQL Server does it and the performance is great. – Mikael Dúi Bolinder Mar 06 '19 at 09:25
  • 1
    In fact, this code won't take *less* time as each task scans the entire list. At best it will take the same time but use more cores. Worst case, it will take *more* time – Panagiotis Kanavos Mar 06 '19 at 09:25
  • 1
    This isn't a parallel query at all. And SQL Server doesn't scan the same data multiple times, it uses multiple cores in the same query to process the same data faster. – Panagiotis Kanavos Mar 06 '19 at 09:26
  • So, these aren't LINQ queries? And Task's can't run in parallell? – Mikael Dúi Bolinder Mar 06 '19 at 09:27
  • 1
    *Parallel* processing in this case would mean splitting the data into batches and filter each batch in parallel. That's what `AsParallel()` would do when added to a LINQ query. *This* code though scans the entire list multiple times. It won't even produce the correct results - you'd need a `Union` operation to combine the results from each task. And no, you can't "add" arrays – Panagiotis Kanavos Mar 06 '19 at 09:28
  • PLINQ though would partition the data, filter and collect the first 10 results automatically. No raw tasks needed, no `Distinct()` step at the end. That's how SQL Server works too. If you check any parallelized execution plan you'll see parallel operators that work on partitions, followed by a stream aggregation step that combines the results at the end – Panagiotis Kanavos Mar 06 '19 at 09:32
  • Doing some simple math: doing one comparision is faster than doing 5. If the result set is huge, using this code would still improve performance. If the allCards is only 10 cards, then it might decrease performance. – Mikael Dúi Bolinder Mar 06 '19 at 09:32
  • No it wouldn't, in the slightest. That's because for 1000 items you'd be doing 1000 comparisons 5 times over, for a total of 5000 iterations. Each of those, 5 times would still scan all 1000 items so it would take the same time. On an 8-core machine you wouldn't see any delays. On a quad though, you'd take *twice* as long* - that 5th comparison would have to wait for the others to finish, and take just as long. On a dual core, you'd need 3 times as long. And you haven't included the time needed to join the results and return distinct values. – Panagiotis Kanavos Mar 06 '19 at 09:35
  • Well, if you have an 8 core machine - why not use it? Why build an app compatible with Windows NT 16 bit @ 16 MHz CPU when you have a 100 core CPU @ 5 GHz? – Mikael Dúi Bolinder Mar 06 '19 at 09:37
  • `Distinct()` requires a Sort, an expensive operation that in this case is executed using a single thread. – Panagiotis Kanavos Mar 06 '19 at 09:37
  • 1
    Because *you aren't using it at all*. You are *wasting power* doing the same thing 5 times over for *no* benefit. Do you want to run faster? Partition the data and use one task per partition to perform the full filtering. Or let PLINQ do that with `resultList.AsParallel().Where(x=> String.Contains(x.Name,query,true) || ...).Take(10)`. This will create as many partitions as there are cores, use a task for each partition and pick the first 10 results *without* an expensive sort at the end – Panagiotis Kanavos Mar 06 '19 at 09:39
  • @PanagiotisKanavos why don't you suggest `AsParallel` as an answer? I suggested this way as an alternative, maybe the `CardRepo` has methods we don't know about, that would need to use this code and not `AsParallel`? – Mikael Dúi Bolinder Mar 06 '19 at 09:39
  • Because it's probably *not* needed. The *actual* improvement is using a case-invariant `Contains` and avoiding all the waste. That's good for a comment at most until the OP provides more details – Panagiotis Kanavos Mar 06 '19 at 09:42
  • Come to think of it, the OP is searching for *multiple* terms. A regex that searches for multiple words at once would probably get rid of that too. – Panagiotis Kanavos Mar 06 '19 at 09:43
  • Loading everything into memory is a very bad idea. – BartoszKP Mar 06 '19 at 11:17