0

I was wondering if anybody here could help me out as I'm still very new to C#. I have a drive with folders w/in folders that all contain pdffiles. Is there a way to recursively loop through the files and read these files and write the data to a .txt file I'm not sure how to implement this into my Console app--so does anybody have any code that might help?

i tried this prgrm but its throwing errors as "c:\anil not found as file or resource."

class Program
{

    static void Main(string[] args)
    {
        DirectoryInfo di = new DirectoryInfo(@"C:\anil");
        FileInfo[] pdfFiles = di.GetFiles("*.pdf", SearchOption.AllDirectories);
        foreach (FileInfo pdf in pdfFiles)
        {
            Console.Write(ReadFile(pdf.FullName));
        }
        Console.Read();
    }

    public static string ReadFile(string destfolder)
    { 
        foreach(string file in Directory.Enumeratefiles(destfolder,"*.pdf"))
        {
            PdfReader pdfreader = new PdfReader(destfolder); 
        }
        string pdfText = string.Empty;
        for (int i = 1; i <= pdfreader.NumberOfPages; i++)
        {
            ITextExtractionStrategy itextextStrat = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy(); 
            PdfReader reader = new PdfReader(Filename);
            String extractText = PdfTextExtractor.GetTextFromPage(reader, i, itextextStrat);
            extractText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(extractText)));
            pdfText = pdfText + extractText; reader.Close(); } return pdfText; 
        } 
    }
}
PVitt
  • 11,500
  • 5
  • 51
  • 85
Anil Kumar
  • 11
  • 4

2 Answers2

0

You'll have to

  1. Walk the directory tree. See this for an example.
  2. Then get the file(s). You can rather use the SearchOption to search subdirectories.
  3. Read the pdf files (stackoverflow will help) and write to the text file.

P.S: If you could give us more information, about what you've tried/or how have you approached this. You'll get more specific answers.

Community
  • 1
  • 1
abhinav
  • 3,199
  • 2
  • 21
  • 25
  • i tried this code but its throwing errors.....it should read all the pdf files in the folder and its subfolders – Anil Kumar Nov 30 '11 at 10:32
  • It would be easier to debug if you could put the errors as well. – abhinav Nov 30 '11 at 11:29
  • "c:\\anil not found as file or resource." this is the error m geting after debug in th Readfile method – Anil Kumar Nov 30 '11 at 11:33
  • A complete trace would be helpful, but from what you mentioned, either the file does not exist or you don't have access to it. What does (FileInfo) `pdf.Exists` return? – abhinav Nov 30 '11 at 11:42
0

The error suggest that the folder C:\anil either doesn't exist or that the account that the program is running under does not have permissions to access it.

As for your code - several things stand out as possible issues.

You are treating the parameter passed into ReadFile as a folder, though you are passing in a file name, not a folder.

Your foreach loop will only work on the next line, as you haven't put everything you need to loop over in a code block {}:

  foreach(string file in Directory.Enumeratefiles(destfolder,"*.pdf"))
  {
      PdfReader pdfreader = new PdfReader(destfolder); 
      string pdfText = string.Empty;
      for (int i = 1; i <= pdfreader.NumberOfPages; i++)

      {
           ITextExtractionStrategy itextextStrat = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy(); 
           PdfReader reader = new PdfReader(Filename);
           String extractText = PdfTextExtractor.GetTextFromPage(reader, i, itextextStrat);
           extractText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(extractText)));
           pdfText = pdfText + extractText; reader.Close(); } return pdfText; 

      } 
  }
Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • i had tried it but its not wrkng..........m getin error in PdfReader statement.. Embedded statement cannot be a declaration or labeled statement – Anil Kumar Nov 30 '11 at 14:19
  • @AnilKumar - Did you debug through? What is the error? Do you have the right permissions? – Oded Nov 30 '11 at 14:21
  • yes i hve the right permissions....error is Embedded statement cannot be a declaration or labeled statement with this( PdfReader pdfreader = new PdfReader(destfolder);) statement – Anil Kumar Nov 30 '11 at 14:26
  • @AnilKumar - Did you debug through? Do you have a `using` statement you have not posted? – Oded Nov 30 '11 at 14:28