3

I've created a .net MVC application in C# that lists organisations, there are currently 6000 records in the database (SQLServer). The fields in the organisation table are:

  • Title (Alcohol support group)
  • ContactPerson (James Bond)
  • Content (We provide support to people with alcohol problems)
  • Keywords (Alcohol, Addiction, Alcoholic)

The current search is done using linq, e.g. :

iList<Organisation> orglist = myOrgs.Where(x => x.Title.Contains('abc') || 
                                                x.ContactPerson.Contains('abc') || 
                                                x.Details.Contains('abc') || 
                                                x.Keywords.Contains('abc'))
                                    .OrderBy(x => x.Title).ToList();

The results are then ordered by title. Which is not suitable.

If someone searches for 'Alcohol Support' I want the above result at the top of the list.

I would like the results to be ranked on the following:

  1. Full sentence matches in organisation titles.
  2. All search terms in organisation titles.
  3. Any search terms in organisation titles.
  4. Any search terms in organisation keywords.
  5. Full sentence matches in organisation content.

Looking for advice on the best way to implement this, or if anyone knows of any algorithms/libraries out there that do this already?

** Update ** Im looking at a simpler solution now, see this link:

Sorting collection based on keywords with Linq

Dez79
  • 527
  • 2
  • 9
  • 25
  • 1
    The results aren't sorted by anything, they are returned in the order they were found in the source list. *Your code* will have to specify the sort order based on a field or ranking method with [OrderBy](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.orderby?view=netframework-4.7.2#System_Linq_Enumerable_OrderBy__2_System_Collections_Generic_IEnumerable___0__System_Func___0___1__System_Collections_Generic_IComparer___1__). If no ranking field exists, your code will have to provide the comparer used to order the results. – Panagiotis Kanavos Jan 07 '19 at 12:13
  • You need a custom IComparer : https://learn.microsoft.com/en-us/dotnet/api/system.collections.icomparer?view=netframework-4.7.2 – jdweng Jan 07 '19 at 12:14
  • If you want the *database* to rank results you'll have to use *the database's* Full-Text-Search features for that. Performance will be a lot faster than wildcard searches across multiple fields in all table rows. SQL Server for example offers FTS with ranking, accelerated by free text search indexes. [This example](https://learn.microsoft.com/en-us/sql/relational-databases/search/query-with-full-text-search?view=sql-server-2017) how to perform various queries and filder by rank – Panagiotis Kanavos Jan 07 '19 at 12:15
  • BTW what database are you using? – Panagiotis Kanavos Jan 07 '19 at 12:18
  • @PanagiotisKanavos thanks, its a SQL Server DB – Dez79 Jan 07 '19 at 12:19
  • I guess doing it form database would be best. But if you want to stay in linq, Idea I have is creating a set of search results, which are concat together and then destinct. example: myorgs.Where(x => x.Title.Contains(searchText)) .Concat(myorgs.Where(x => searchText.Split(' ').All(con => x.Title.Contains(con)))) .Concat(myorgs.Where(x => searchText.Split(' ').Any(con => x.Title.Contains(con)))) // more .Distinct(); – Malior Jan 07 '19 at 12:26
  • You can use SQL Server's FTS features then. EF Core 2.1 supports `FreeText` as shown in [this SQ question](https://stackoverflow.com/questions/51047514/how-to-use-freetext-in-ef-core-2-1). Other functions can be used through `FromSql` – Panagiotis Kanavos Jan 07 '19 at 12:27
  • @Malior the only way for this to work would be to load everything into memory. This can't be translated to SQL. If you load everything into memory, you might as well use a specialized library for full-text search. Your query doesn't rank the results in any case – Panagiotis Kanavos Jan 07 '19 at 12:28
  • @PanagiotisKanavos I'm still using .net 4.6, so cant update to EF core as this is part of a wider application – Dez79 Jan 07 '19 at 12:34
  • @Dez79 1) that's *another* thing you should have mentioned in the question itself and 2) doesn't change the fact that you need FTS on the server. If you search for EF and FTS you'll find a lot of ways you can call functions that aren't already supported. You can run raw SQL, map a TVF function to a context function etc. – Panagiotis Kanavos Jan 07 '19 at 12:56

1 Answers1

0

Summary :

  • Requirement's

    • R01 | titles | Full matches |in order
    • R02 | titles | Full matches |in any order
    • R03 | titles | Any matches |
    • R04 | keywords | Any matches |
    • R05 | content | Full matches |
  • For each Requirement We will do SQL Call

  • Each SQl Call we will return Only rows id's

  • Then we Group the Id's in order

  • We will do a final SQL call


Step 01 : R01

  • Here we will use EF

    db.Orgs.Where(w => w.Title.Contains(search_query))
    .Select(s => s.Id).ToList();
    

This use of linq2Sql Contains will be translated to sql WHERE IN

Step 02 : R02

  • here we will use plain sql WHERE + LIKE + AND

    Select Id From Orgs where 
    Title LIKE '%' + @param0 +'%' 
    and Title LIKE '%' + @param1 +'%' 
    

Step 03 : R03

  • here we will use plain sql WHERE + OR + And

    Select Id From Orgs 
    where Title LIKE '%' + @param0 +'%'  
    or  Title LIKE '%' + @param1 +'%' 
    

Step 04 : R04

  • here we will use plain sql WHERE + OR + And

    Select Id From Orgs 
    Where Keywords LIKE '%' + @param0 +'%' 
    or  Keywords LIKE '%' + @param1 +'%' 
    

Step 05 : R05

  • Here we will use EF

        db.Orgs
        .Where(w => w.Content.Contains(search_query)).
        Select(s => s.Id).ToList();
    

This use of linq2Sql Contains will be translated to sql WHERE IN

Step 06 - Group The Id's and ignore duplicable one's

  • using the row id's form step 1 to 5

we will sort the id's base on order retrieval

        var ids = new Dictionary<int, int>();

        foreach (var id in Ids1)
        {
            int val;
            if (!ids.TryGetValue(id, out val))
            {
                ids.Add(id, ids.Count());
            }

        };
        .
        .

Step 07 - Re-order

         ids.OrderByDescending(o => o.Value)
        .Select(s => s.Key) .ToArray();

Step 08 - Now we wi use The Oredred Id's to Get The data

  • here we will use plain sql ORDER BY + CASE WHEN THEN ELSE END

    Select * from Orgs 
    where  Id in ( 2 , 1 )  
    ORDER BY CASE id  
    WHEN 2 THEN 0 
    WHEN 1 THEN 1 
    ELSE 2 END
    

Step 09 Full Code


    using System;
    using System.Collections.Generic;
    using System.Linq;

    namespace ConsoleApp9
    {
    class Program
    {
    static void search(string search_query)
    {
    //////////////////////////////////////////////////
    var terms = search_query.Split(' ');
    //////////////////////////////////////////////////
    var Ids1 = db.Orgs.
    Where(w => w.Title.Contains(search_query))
    .Select(s => s.Id).ToList();

    var Ids2 = db.Database
    .SqlQuery<int>(getWhere("Title", "AND"), terms)
    .ToList();

    var Ids3 = db.Database
    .SqlQuery<int>(getWhere("Title", "OR"), terms)
    .ToList();

    var Ids4 = db.Database
    .SqlQuery<int>(getWhere("Keywords", "OR"), terms)
    .ToList();

    var Ids5 = db.Orgs
    .Where(w => w.Content.Contains(search_query))
    .Select(s => s.Id).ToList();

    var ordered_ids = reorderList(Ids1, Ids2, Ids3, Ids4, Ids5);

    var OrderedData = db.Database.SqlQuery<Org>(getOrdered(ordered_ids)).ToList();

    //////////////////////////////////////////////////

    foreach (var item in OrderedData)
    {
        Console.WriteLine($"{item.Id} - {item.Title} - {item.ContactPerson } - {item.Keywords } - {item.Content  }");

    }

    //////////////////////////////////////////////////
    Console.ReadLine();
    //////////////////////////////////////////////////
    string getWhere(string _column, string _oprator)
    {
        var val = "Select Id From Orgs where ";
        for (int i = 0; i < terms.Length; i++)
        {
            if (i > 0) val += @" " + _oprator + " ";
            val += @" " + _column + " LIKE '%' + {" + i + "} +'%'  ";
        }
        return val;
    }
    //////////////////////////////////////////////////
    string getOrdered(int[] _ids_ordered)
    {
        var val = "Select * From Orgs where ";
        val += " Id in ";
        for (int i = 0; i < _ids_ordered.Length; i++)
        {
            if (i == 0) val += "( ";
            if (i > 0) val += " , ";
            val += _ids_ordered[i];
            if (i == terms.Length - 1) val += " ) ";
        }
        val += " ORDER BY CASE id ";
        for (int i = 0; i < _ids_ordered.Length; i++)
        {
            val += " WHEN " + _ids_ordered[i] + " THEN " + i;
        }
        val += " ELSE " + _ids_ordered.Length + " END ";

        return val;
    }
    //////////////////////////////////////////////////
    int[] reorderList(List<int> _Ids1, List<int> _Ids2,
        List<int> _Ids3, List<int> _Ids4, List<int> _Ids5)
    {
        var idsDic = new Dictionary<int, int>();

        var idsArr = new List<int>[5] { Ids1, Ids2, Ids3, Ids4, Ids5 };
        for (int i = 0; i < 5; i++)
        {
            idsArr[i].ForEach(id =>
            {
                if (!idsDic.TryGetValue(id, out int val))
                    idsDic.Add(id, idsDic.Count());
            });
        };
        var o_ids = idsDic.OrderByDescending(o => o.Value)
                .Select(s => s.Key).ToArray();
        return o_ids;
    }
    }

    static Model1 db = new Model1();

    static void Main(string[] args)
    {
    string search_quer = "Alcohol Support";
    Console.WriteLine($"searching for {search_quer}");
    search("Alcohol Support");
    }
    }


    }

Note 01 : Sql Injection

  • What is Sql Injection

    SQL injection is a code injection technique, used to attack data-driven applications, in which nefarious SQL statements are inserted into an entry field for execution

Note 01.01 :The Problem

Note 01.02 : Microsoft documentation

  • From Microsoft documentation :How to: Directly Execute SQL Queries | Microsoft Docs

    The parameters are expressed in the query text by using the same curly notation used by Console.WriteLine() and String.Format(). In fact, String.Format() is actually called on the query string you provide, substituting the curly braced parameters with generated parameter names such as @p0, @p1 …, @p(n). enter image description here

Note 01.03 : In this project

  • When using EF 6.2

    var sql2 = " Select Id From Orgs where ";
    for (int i = 0; i < terms.Length; i++)
    {
        if (i > 0) sql2 += @" and ";
        sql2 += @" Title LIKE '%' + {" + i + "} +'%'  ";
    }
    
  • Will generate :

    Select Id From Orgs where  
    Title LIKE '%' + {0} +'%'   
    and  Title LIKE '%' + {1} +'%'  
    
  • In sqlserver using SQL Server Profiler

    exec sp_executesql N' Select Id From Orgs where  
    Title LIKE ''%'' + @p0 +''%''   and  Title 
    LIKE ''%'' + @p1 +''%''  ',N'@p0 nvarchar(7)
    ,@p1 nvarchar(7)',@p0=N'Alcohol',@p1=N'Support'
    

Note 01.04 : Another format

  • we can also use SqlParameter class

    var sql4 = " Select Id From Orgs where  ";
    var sql4_parameter = new List<SqlParameter>();
    for (int i = 0; i < terms.Length; i++)
    {
        if (i > 0) sql4 += @" or ";
        sql4 += @" Keywords LIKE '%' + @param" + i + " +'%'  ";
        sql4_parameter.Add(new SqlParameter("param" + i, terms[i]));
    }
    
  • and here is sql

     Select Id From Orgs 
     Where Keywords LIKE '%' + @param0 +'%' 
     or  Keywords LIKE '%' + @param1 +'%'  
    
Mohamed Elrashid
  • 8,125
  • 6
  • 31
  • 46