0

So I have created some lines of code that can take two strings, split them, and compared each word of 1 string to the other and said that the same word exists in both if so , but would this be an efficient way to compare words over a large volume of text, talking of 300- 10000 words, because string, split works by arrays so would it screw over the computer memory? sorry I'm still learning a level cs so hardly know any terminology.

I heard that regex would be extremely good at this kind of thing but its pretty confusing.

static void Main(string[] args)
{
    string text1 = "yeet went the black fox  cry went the chicken";
    string text2 = "yeet  the  fox  cry  the ";

    string[] spaced1 = text1.Split(" ");
    string[] spaced2 = text2.Split(" ");

    for (int s = 0; s < spaced1.Length; s++)
    {
        if (spaced1[s]== spaced2[s])
        {
            Console.WriteLine("same word");
            Console.WriteLine(spaced1[s]);            
        }
    }

    Console.ReadLine();
}

this specific code gives the results I want, and I still need to make it so it splits at comas and full stops etc.

Rand Random
  • 7,300
  • 10
  • 40
  • 88
  • 7
    10000 words is peanuts in this age – spender Aug 21 '19 at 16:27
  • In general, [string operations are almost always more efficient than regex operations](https://stackoverflow.com/questions/16638637/whats-faster-regex-or-string-operations). The code you have is fine. Note, however, that you're only comparing words at the same index in both strings, so it only returns matches where the word is identical (case-sensitive) *and* is in the same position in both strings. – Rufus L Aug 21 '19 at 16:49

2 Answers2

0

Not entirely certain what you're trying to achieve here, but assuming it's a learning project.

What you're doing is trying to find the items that exist in both of your arrays. For this you can use Intersect method.

string text1 = "yeet went the black fox  cry went the chicken";
string text2 = "yeet  the  fox  cry  the ";

string[] spaced1 = text1.Split(' ');
string[] spaced2 = text2.Split(' ');

IEnumerable<string> output = spaced1.Intersect(spaced2);

This would create your desired output.

touchofevil
  • 595
  • 4
  • 21
  • OP said he already has the desired output. He just wants to know if using a regular expression would be more memory efficient than string.split. – Casey Crookston Aug 21 '19 at 16:40
  • will this work with streamreader? as in if I just got all the words from a file –  Aug 21 '19 at 16:41
  • 3
    The code sample in the question only returns words that are identical *and are in the same position*. The code in this answer returns all words that match, regardless of position. Not sure if that matters. – Rufus L Aug 21 '19 at 16:46
  • ahhh that makes more sense –  Aug 21 '19 at 16:56
  • so I implemented the "IEnumerable output = spaced1.Intersect(spaced2);" code but it gives "System.Linq.Enumerable+d__77`1[System.String]" instead of the esired string, I understand that this is not an error but I have noo idea how to actually output it as such. –  Aug 21 '19 at 17:24
  • `output` is not a `string` so you can't just output it, it's a collection of strings, so you have to do something like `output.ToList().ForEach (x => Console.WriteLine(x))` – iakobski Aug 21 '19 at 19:59
0

If you have to deal with a large number of words I would expect that they are stored in some file. Then you can use a Stream. In the case of 10000 words, you don't have to worry as it is not a large number these times. You can have to look how many words you can have in a string here Cheers

GoldenAge
  • 2,918
  • 5
  • 25
  • 63