2

I have a method that checks if the current item exists in the database before it adds it to the database, if it does exist it deletes the item else it adds it.

Is there any better way of doing this? Because right now the titles have to be exactly the same. If the titles have a char/word difference then it wont delete it.

Basically what I mean is this:

If title is "Ronaldo lost his right leg" and there is a title in the database that is "Ronaldo lost his right leg yesterday" it should delete current item.

Another example:

If title is "hello world" and there is a title in the database that is "hello world everyone" it should delete current item.

So basically if the text has common words it should delete the item.

Here is the method I have so far:

public void AddNews(News news)
    {
        var exists = db.News.Any(x => x.Title == news.Title);

         if (exists == false)
        {
            db.News.AddObject(news);
        }
        else
        {
            db.News.DeleteObject(news);
        }
    }

Any kind of help is appreciated.

Obsivus
  • 8,231
  • 13
  • 52
  • 97
  • C'mon, use triggers, you need `instead of` triggers, you may find it [here](http://www.codeproject.com/Articles/616114/SQL-Server-T-SQL-Tips-Tricks#insteadoftrg) for help –  Jul 16 '13 at 20:21
  • Please define clearly when two titles should be considered equal. You say: "So basically if the text has common words it should delete the item". This means that "Ronaldo lost his right leg" and "David lost his keys" are the same. I strongly doubt that this is your intent. – pescolino Jul 21 '13 at 16:15
  • Also define that how many words should match to consider them equal. And whether they should be in the beginning or anywhere in the line or only at the end of the line. – Nisarg Shah Jul 22 '13 at 06:25
  • I think you might want to consider something derived from the concept of Latent Semantic Analysis. Basically you want to use an exclusion list to omit unimportant terms such as if, of, with, etc., and then determine a statistical match for the remaining keywords with some weighting for important highly relevant terms. Have a look here for an example ` `[web]:` http://www.puffinwarellc.com/index.php/news-and-articles/articles/… The tutorial on SVD that leads up to it is also an interesting read, perhaps less relevant. – John Faulkner Jul 22 '13 at 08:46

5 Answers5

0

I don't know C#, but BASIC has instr$ and javascript has indexOf()... C# probably has something similar which will check if your string is present in the other string - which means it will show as a match for "hello" or "hello world" or "world hello" if you search "hello", but "hello world" won't find "world hello"... since I don't know C# this isn't valid code but should set you on the right track...

var dbTitle = wherever you get the existing titles from
var yourSearchTerm = what you want to find

if (dbTitle.indexOf(yourSearchTerm)>0) { //indexOf() returns -1 if match not found
db.News.AddObject(news);
}
else {
db.News.DeleteObject(news);
}

Search string manipulation in your help file to find the right command.

Kat Cox
  • 3,469
  • 2
  • 17
  • 29
0

Your question raises more questions than it invites to provide a clear answer. May I ask why you'd ever want to delete an item in a method that's called AddItem? Don't you mean you want to update an item when it's identifier matches the one that's provided as an argument?

Having said that, there's basically two options to implement the desired behavior: perform the match in code, or perform the match in the database. However, both naturally require you two define exactly what it means two have a match. The other answers already hint to that.

The advantage of performing the match in code is flexibility: it's generally easier to pogram (complex) logic of this kind in C# that it is to write it in SQL. Plus you'll get the persistance-code for free since you can use EF (as I assume you're using looking at the code sample).

The advantage of doing it in SQL is that it's more performant, because you don't have to retrieve the entire entity from the database before you make the insert/update/delete decision. You can do this by adding an INSTEAD-OF INSERT trigger on the entity-table and perform an update/delete whenever you find the provided entity actually matches an existing one.

Wim.van.Gool
  • 1,290
  • 1
  • 10
  • 19
0

First, I agree with @Jonesy that the strings could be split into words using

string[] list1 = myStr.Split(null);

The null forces splitting on whitespace. See: Best way to specify whitespace in a String.Split operation

and those words can be put into lists. The intersection of the lists right away tells you which words match exactly, and how many words match exactly. Any other words are words that don't match.

var result = list1.Intersect(list2, StringComparer.InvariantCultureIgnoreCase);

So for the words that don't match, you can get a score for each word comparison using the Levenshtein distance. I included code below but haven't tested to see if this is a correctly working implementation. Anyway, the reason to use this is that you can compare each word by how many operations it takes to make one word match another. So misspelled words that are very close can be counted as equal.

However, as has been pointed out, the whole process is going to be very error prone. What it sounds like you really want to do is compare the MEANING of the two strings, and while we are making advances in that direction, I am not aware of any C# over the counter AI for parsing meaning from sentences yet.

using System;

/// <summary>
/// Contains approximate string matching
/// </summary>
static class LevenshteinDistance
{
    /// <summary>
    /// Compute the distance between two strings.
    /// </summary>
    public static int Compute(string s, string t)
    {
    int n = s.Length;
    int m = t.Length;
    int[,] d = new int[n + 1, m + 1];

    // Step 1
    if (n == 0)
    {
        return m;
    }

    if (m == 0)
    {
        return n;
    }

    // Step 2
    for (int i = 0; i <= n; d[i, 0] = i++)
    {
    }

    for (int j = 0; j <= m; d[0, j] = j++)
    {
    }

    // Step 3
    for (int i = 1; i <= n; i++)
    {
        //Step 4
        for (int j = 1; j <= m; j++)
        {
        // Step 5
        int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;

        // Step 6
        d[i, j] = Math.Min(
            Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
            d[i - 1, j - 1] + cost);
        }
    }
    // Step 7
    return d[n, m];
    }
}

Cited from here: http://www.dotnetperls.com/levenshtein

Community
  • 1
  • 1
Ted
  • 3,212
  • 25
  • 20
0

I have a method that checks if the current item exists in the database before it adds it to the database, if it does exist it deletes the item else it adds it.

Are you sure you want to delete (and re-add?) the item when found? You probably want to find a way to update the data instead. That will be much more efficient and less error prone. (For example, if you use delete, your record will be missing for everyone for a few miliseconds, and it will be gone forever if the client crashes at the wrong time.)

Also, you probably want to record all the things that the user types.

1) they are helpful for later mapping "what people searched for" to "what people really wanted". If one person typos, it's likely other people will typo that same way. (i.e. people rarely type "tqe" when typing "the". But they type "teh" all the time.)

2) you never know which one is the "best". More words isn't always better.

You're probably better off having a name table with "name, item_id" that allows multiple names to map to the same item in the items table that has the item attributes.

BraveNewCurrency
  • 12,654
  • 2
  • 42
  • 50
-1

if the title just have a small char difference it wont delete it.

using ToUpper() on both ends will ensure a valid check, even if the casing is different

var exists = db.News.Any(x => x.Title.ToUpper() == news.Title.ToUpper());

if you want other ways to check if an object exists, we need more information.

Update

Going on your comment, you can strip all non-alphanumeric characters from it

Regex rgx = new Regex("[^a-zA-Z0-9 -]");
var exists = db.News.Any(x => rgx.Replace(x.Title.ToUpper(), "") == rgx.Replace(news.Title.ToUpper(), ""));

and "Hello world" will match "Hello World!"

Jonesopolis
  • 25,034
  • 12
  • 68
  • 112