0

Hello I am trying to calculate cosine similarity between sentences with the given array that consist 5 words.So ı have an asp.net project where I wrote the code to calculate the top 5 words(by frequency) out of a text(around 50-60 sentences) and I have these words in array K.Until here everything is ok.I would like to get each sentences from text(which is an input in my problem and there is a text area inapp where user an paste any text or article) and established the vectors.For example; Lets assume that array

K={technology, product,player}

and the given text is this;

Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services. Its hardware products include the iPhone smartphone, the iPad tablet computer, the Mac personal computer, the iPod portable media player, and the Apple Watch smartwatch. Apple's consumer software includes the OS X and iOS operating systems, the iTunes media player, the Safari web browser, and the iLife and iWork creativity and productivity suites. Its online services include the iTunes Store, the iOS App Store and Mac App Store, and iCloud.

so four vectors for four sentences should be like this

s1={1,0,0} s2={0,1,1} s3={0,0,1} s4={0,0,0}

How can i establish these vector on asp.net?

dpointttt
  • 23
  • 5

1 Answers1

0

I am using the following SQL LIKE method and a Print2DArray method. If you absolutely need to use arrays

public static void Main(string[] args)
    {
        string[] keywords = {...}; // your keywords
        string text = "..."; // your text

        string[] textInArray = text.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
        int[,] vectors = new int[textInArray.Length, keywords.Length];

        for (int i = 0; i < textInArray.Length; i++)
        {
            string[] words = textInArray[i].Split(' ');
            for (int j = 0; j < keywords.Length; j++)
            {
                foreach (var word in words)
                {
                    if (Like(word, "%"+keywords[j]+"%"))
                    {
                        vectors[i, j]++;
                    }
                }
            }
        }
        Print2DArray(vectors);
    }

Keep in mind, the SQL like method I use makes "products" a version of "product" and ups the counter, but it makes "productivity" a version of "product" as well. It splits Apple Inc. as a sentence. You need to fine tune this, as it is a VERY BASIC version of what you want to accomplish. You could use dictionary of string and int array, or even a struct to get this done more elegantly, but the basics are the same.

I am here for further questions!

Community
  • 1
  • 1
  • First,thank you so much for your very helpful answer.which library should i add to my code to be able to use Like and Print2DArray methods? – dpointttt May 11 '16 at 09:12
  • I have used these to library using System.Data.Linq.SqlClient; using System.Data.Linq; and as a method I have used if ( SqlMethods.Like(word, "%"+keywords[j]+"%")) this however I got this error saying "Method 'Boolean Like(System.String, System.String)' cannot be used on the client; it is only for translation to SQL." – dpointttt May 11 '16 at 09:42
  • How ever when I use this instead of Like it worked if (word.Contains (keywords[j])) – dpointttt May 11 '16 at 10:17
  • I have given links to other stackoverflow questions containing said methods (first line of my answer), but I did not think of string.Contains() . Great idea ! – Daniel Tsvetkov May 11 '16 at 10:57