0

I want to find the index/offset of a batch of keywords in a given text file. Now, I've come across many questions on stackoverflow but answer this one fits the best for me.

The only issue here is that above questions provide a solution for only a single keyword while I have more than 25 keywords to find & i think there has to be a better solution than to write switch...case or if...else for each keyword.

How can I optimize my task here? Any better approach other than the linked question is also welcome.

Let'say my text file has following content:

Stephen Haren,December,9,4055551235

Laura Clausing,January,23,4054447788

William Connor,December,13,123456789

Kara Marie,October,23,1593574862

Audrey Carrit,January,16,1684527548

Sebastian Baker,October,23,9184569876

And keywords i want to find are :

December, January, March, April, May

Now, the output should be:

December : 16

January : Overall Index of January in line 2

December : Overall Index of January in line 3

....

Current code:

class Program
    {
        static void Main(string[] args)
        {
            var keyword = "December";
            var keyword2 = "January";
            int totalLength = 0;
            using (var sr = new StreamReader("file.txt"))
            {
                while (!sr.EndOfStream)
                {
                    var line = sr.ReadLine();
                    if (String.IsNullOrEmpty(line)) continue;
                    if (line.IndexOf(keyword, StringComparison.CurrentCultureIgnoreCase) >= 0)
                    {
                        Console.WriteLine("December: " + (totalLength + line.IndexOf(keyword, StringComparison.CurrentCultureIgnoreCase)));
                    }
                    if (line.IndexOf(keyword2, StringComparison.CurrentCultureIgnoreCase) >= 0)
                    {
                        Console.WriteLine("January: " + (totalLength + line.IndexOf(keyword, StringComparison.CurrentCultureIgnoreCase)));
                    }
                    .................
                  ........................................................

                    totalLength += line.Length;
                }

            }
        }
    }

Note: Tagging Java here too as I am interested in approach rather than language-specific solution.

Dheeraj Kumar
  • 410
  • 6
  • 16
Harshil Doshi
  • 3,497
  • 3
  • 14
  • 37

1 Answers1

0

I would definitely suggest you check out a Trie data structure.

A good implementation of Tries will search the input text character-by-character and returns whenever it reaches the end of a recognized word. It's extremely efficient i.t.o. runtime (approx O(n + m)) and not bad when it comes to memory consumption either. (FYI: In this case, n is the length of the input text and m is the length of the word you're looking for).

I found the following tutorial on the subject:

https://www.geeksforgeeks.org/trie-insert-and-search/

And here is a good implementation I found via StackOverflow:

http://www.glennslayden.com/code/c-sharp/trie

Eric McLachlan
  • 3,132
  • 2
  • 25
  • 37