0

here I am doing for one file and store in another folder. how to do it for multiple files?

try 
{ 
     Document document = new Document();
     document.open();
     FileOutputStream fos=new FileOutputStream("C:\\Users\\user\\Desktop\\pdf\\MyCSVFile.csv");
     StringBuilder parsedText=new StringBuilder();
     PdfReader reader1 = new PdfReader("C:\\Users\\user\\Desktop\\pdf\\NL-26.pdf");
     int n = reader1.getNumberOfPages();
     for (int i = 0; i <n ; i++) 
     {
        parsedText.append(parsedText+PdfTextExtractor.getTextFromPage(reader1, i+1).trim()+"\n") ;
     }
     StringReader stReader = new StringReader(parsedText.toString());
     int t;
     while((t=stReader.read())>0)
         fos.write(t);
         document.close();
JeffC
  • 22,180
  • 5
  • 32
  • 55
etishree
  • 17
  • 3

3 Answers3

1

You can get all the files under the given directory using this:

final File folder = new File("C:\\Users\\user\\Desktop\\pdf");
final File[] listOfFiles = folder.listFiles();

for (int i = 0; i < listOfFiles.length; i++) {
    final File file = listOfFiles[i];
    if (file.isFile() && file.getAbsolutePath().endsWith(".pdf")) {
        parsePdf("C:\\Users\\user\\Desktop\\pdf\\MyCSVFile"+i+".csv", file.getAbsolutePath());
    }
}

If you refactor your business logic to a separate method then you can use it from the if.

private static void parsePdf(final String fileToWrite, final String fileToRead) throws IOException {
    try (FileOutputStream fos = new FileOutputStream(fileToWrite)) {
        final StringBuilder parsedText = new StringBuilder();
        final PdfReader reader1 = new PdfReader(fileToRead);
        int n = reader1.getNumberOfPages();
        for (int i = 0; i < n; i++) {
            parsedText.append(parsedText + PdfTextExtractor.getTextFromPage(reader1, i + 1).trim() + "\n");
        }
        final StringReader stReader = new StringReader(parsedText.toString());
        int t;
        while ((t = stReader.read()) > 0)
            fos.write(t);
    }
}
Rashin
  • 726
  • 6
  • 16
  • Is this assuming every file inside of the directory ends in pdf? What if there is another file type in that directory? – Nexevis Jun 24 '19 at 12:39
  • It's just an assumption. It would be logical to put only .pdf's to a folder named `pdf` if someone want to parse a lot of .pdf. :) Of course we can check the extension of the file as another condition in the if case. – Rashin Jun 24 '19 at 12:45
  • You say that but you are literally outputting all the `.csv` files into the folder called `pdf`. So you are already breaking your own logic. You shouldn't just _assume_ that the code will be used correctly and not account for it. – Nexevis Jun 24 '19 at 12:46
  • Thank you for your remarks. Updated my answer according to them. :) – Rashin Jun 24 '19 at 12:50
  • @Roshni why it showing error it showing change pdfReader to string – etishree Jun 25 '19 at 05:55
  • Updated my answer. You have to modify the `parsePdf("C:\\Users\\user\\Desktop\\pdf\\MyCSVFile"+i+".csv", file.getAbsolutePath());` and the `private static void parsePdf(final String fileToWrite, final String fileToRead) throws IOException {` lines. – Rashin Jun 25 '19 at 06:06
  • Use the above mentioned two new lines in your program. I changed the method signature. – Rashin Jun 25 '19 at 06:11
0

Can do something like this:

public static void convertAllCSV(String directory)
{
    try 
    {
        ArrayList<String> files = findFiles(directory); //Returns list of all files in folder with .pdf extension

        for (String s : files)
        {
            convertSingleCSV(s, directory); //Your current code placed into a method
        }
    } 
    catch (IOException e) 
    {
        e.printStackTrace();
    }   
}

With the findFiles method looking like this:

public static ArrayList<String> findFiles(String directory) throws IOException
{
    ArrayList<String> fileList = new ArrayList<String>();
    File dir = new File(directory);

    String ext = ".pdf";
    String[] files = dir.list();
    for (String file : files)
    {
        //If the file ends with .pdf
        if(file.endsWith(ext))
        {
            fileList.add(file);
        }
    }
    return fileList;
}

There are basically 2 steps you need to add. You need to pass a directory name and find all the files in the directory with the extension .pdf and then use it to call your original method one at a time through a loop.

convertSingleCSV is your code placed into a method then uses the filename and directory to output the new file. So instead of hard coding name of the FileOutputStream just convert it by doing something like this:

String fileNameNoExtension = fileName.substring(0, fileName.lastIndexOf('.'));  //Cuts off the file extension to append csv instead of pdf
FileOutputStream fos = new FileOutputStream(directory + "\\" + fileNameNoExtension + ".csv")

The advantage of doing it this way is you keep the original file names but just create a new file with the .csv extension, and it will only attempt to convert any .pdf files and you do not have to worry about ensuring other files are not in the directory passed.

Nexevis
  • 4,647
  • 3
  • 13
  • 22
0

You can also use the java.nio library since Java 7:

try(DirectoryStream<Path> stream = Files.newDirectoryStream(Path.of("C:\\Users\\user\\Desktop\\pdf\\"), "*.pdf")) {
    stream.forEach(path -> {
        // process the current PDF file (path.toFile to access java.io.File)
    });
} catch (IOException ex) {
    // fail !
}
Dorian
  • 761
  • 3
  • 11
  • 28