3

I have a Folder which has multiple sub folders. Each sub folder has many .dot and .txt files in them.

Is there a simple solution in C# .NET that will iterate through each file and check the contents of that file for a key phrase or keyword?

Document Name        Keyword1         Keyword2         Keyword3        ...
  test.dot              Y               N                Y

To summarise:

  1. Select a folder
  2. Enter a list of keywords to search for
  3. The program will then search through each file and at the end output something like above, I am not to worried about creating the datatable to show the datagrid as I can do this. I just need to perform the find in files function similar to Notepad++'s find in files option

Thanks in advance

CR41G14
  • 5,464
  • 5
  • 43
  • 64

4 Answers4

5

What you want is recursively iterate files in a directory (and maybe it's subdirectories).

So your steps would be to loop eeach file in the specified directory with Getfiles() from .NET. then if you encounter a directory loop it again.

This can be easily done with this code sample:

  public static IEnumerable<string>  GetFiles(string path)
  {
        foreach (string s in Directory.GetFiles(path, "*.extension_here"))
        {
              yield return s;
        }


        foreach (string s in Directory.GetDirectories(path))
        {
              foreach (string s1 in GetFiles(s))
              {
                    yield return s1;
              }
        }
  }

A more indepth look on iterating throug files in directories in .NET is located here:

http://blogs.msdn.com/b/brada/archive/2004/03/04/84069.aspx

Then you use the IndexOf method from String to check if your keywords are in the file (I discourage the use of ReadAllText, if your file is 5 MB big, your string will be too. Line-by-line will be less memory-hungry)

  • One thing to be aware of with Directory.GetFiles & Directory.GetDirectories is that they will throw a path to long exception when the path is greater than 260 characters - it has stung me recently. – David Oct 02 '12 at 13:24
  • @Dve I think windows doesn't support anymore than 260 chars. It will give you errors when you create files or directories which have too long filenames. –  Oct 02 '12 at 13:26
  • @Gam_Erix windows does, .Net doesnt – David Oct 02 '12 at 13:27
  • @Dve how can .NET give you errors with too long pathnames if Windows refuses to create files/directories with a pathname longer than 260 chars? :o well maybe if unpacking from a zip archive, then yes. –  Oct 02 '12 at 13:29
  • @Gam_Erix you can create folder paths longer than 260 chars in windows - try it. – David Oct 02 '12 at 13:29
  • Does this actually search through the contents of a file for keyword? – CR41G14 Oct 02 '12 at 13:32
  • \\data\CL-Home\792662\1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ1234\1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ1234\1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ1234\1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ1234\1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ1234\1234567890ABCDEFGHIJ Longest I can get. :o –  Oct 02 '12 at 13:35
3

You can use Directory.EnumerateFiles with a searchpattern and the recursive hint(SearchOption.AllDirectories). The rest is easy with LINQ:

var keyWords = new []{"Y","N","Y"};
var allDotFiles = Directory.EnumerateFiles(folder, "*.dot", SearchOption.AllDirectories);
var allTxtFiles = Directory.EnumerateFiles(folder, "*.txt", SearchOption.AllDirectories);
var allFiles = allDotFiles.Concat(allTxtFiles);
var allMatches = from fn in allFiles
                 from line in File.ReadLines(fn)
                 from kw in keyWords
                 where line.Contains(kw)
                 select new { 
                     File = fn,
                     Line = line,
                     Keyword = kw
                 };

foreach (var matchInfo in allMatches)
    Console.WriteLine("File => {0} Line => {1} Keyword => {2}"
        , matchInfo.File, matchInfo.Line, matchInfo.Keyword);

Note that you need to add using System.Linq;

Is there a way just to get the line number?

If you just want the line numbers you can use this query:

var matches = allFiles.Select(fn => new
{
    File = fn,
    LineIndices = String.Join(",",
                File.ReadLines(fn)
                .Select((l,i) => new {Line=l, Index =i})
                .Where(x => keyWords.Any(w => x.Line.Contains(w)))
                .Select(x => x.Index)),
})
.Where(x => x.LineIndices.Any());

foreach (var match in matches)
    Console.WriteLine("File => {0} Linenumber => {1}"
        , match.File, match.LineIndices);

It's a little bit more difficult since LINQ's query syntax doesn't allow to pass the index.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Depending on the number and size of the files, a compiled Regex might be significantly faster. http://stackoverflow.com/questions/2962670/regex-ismatch-vs-string-contains#3617013 – JDB Oct 02 '12 at 13:39
  • Excellent answer, is there a way just to get the line number? – CR41G14 Oct 02 '12 at 13:43
2

The first step: locate all files. It is easily done with System.IO.Directory.GetFiles() + System.IO.File.ReadAllText(), as others have mentioned.

The second step: find keywords in a file. This is simple if you have one keyword and it can be done with IndexOf() method, but iterating a file multiple times (especially if it is big) is a waste.

To quickly find multiple keywords in a text I think you should use the Aho-Corasick automaton (algorithm). See the C# implementation at CodeProject: http://www.codeproject.com/Articles/12383/Aho-Corasick-string-matching-in-C

Viktor Latypov
  • 14,289
  • 3
  • 40
  • 55
0

Here's a way using Tim's original answer to get the line number:

var keyWords = new[] { "Keyword1", "Keyword2", "Keyword3" };
var allDotFiles = Directory.EnumerateFiles(folder, "*.dot", SearchOption.AllDirectories);
var allTxtFiles = Directory.EnumerateFiles(folder, "*.txt", SearchOption.AllDirectories);
var allFiles = allDotFiles.Concat(allTxtFiles);
var allMatches = from fn in allFiles
                 from line in File.ReadLines(fn).Select((item, index) => new { LineNumber = index, Line = item})
                 from kw in keyWords
                 where line.Line.Contains(kw)
                 select new
                 {
                     File = fn,
                     Line = line.Line,
                     LineNumber = line.LineNumber,
                     Keyword = kw
                 };

foreach (var matchInfo in allMatches)
    Console.WriteLine("File => {0} Line => {1} Keyword => {2} Line Number => {3}"
        , matchInfo.File, matchInfo.Line, matchInfo.Keyword, matchInfo.LineNumber);
Joey Gennari
  • 2,361
  • 17
  • 26