0

I wrote the code write all pdf files in folder get bytes and write in .dat file.. Acutally its working and writing all bytes in .dat file but When I open that .dat file with Acrobat it open with black page.

Here is my code..

    import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Arrays;
import java.util.Calendar;

public class xmlfile1filebytes {

  public static void main(String[] args) throws IOException {

    File folder = new File ("07072013");
    File[] listOfFiles = folder.listFiles();

    System.out.println("There are " + listOfFiles.length + " files"); 
    String filesin;

    String timeStamp = new SimpleDateFormat("MM-dd-yyyy[HH.mm.ss]")
     .format(Calendar.getInstance().getTime());
     System.out.println(timeStamp);

    BufferedWriter xmlfile = null;
    BufferedWriter datfile = null;

    String outxmlfile = ("07072013\\" + timeStamp + ".xml");
    xmlfile = new BufferedWriter(new FileWriter(outxmlfile));

    String outdatfile = ("07072013\\" + timeStamp + ".dat");
    datfile = new BufferedWriter(new FileWriter(outdatfile));

    int offset = 0;
    int size = 0;

    for (int i = 0; i < listOfFiles.length; i++) {

        File f = listOfFiles[i];

       // System.out.println(i + " " + f.getAbsolutePath());
        if (f.isFile()) {

            filesin = listOfFiles[i].getName();

            if (filesin.endsWith("pdf")) {

                Path aPath = Paths.get(f.getAbsolutePath()); 

                System.out.println(filesin);

                byte[] actualBytes = Files.readAllBytes(aPath);
                size = actualBytes.length;

                xmlfile.append((i + 1) + ")" + " File = " + filesin + ", Offset = " + offset + ", Size = " + size + "\n");


                offset = offset + size;
                xmlfile.newLine();

                String s = new String(actualBytes);

                datfile.append(s);
                datfile.newLine();


                File datfileinfolder = new File ("07072013\\" + timeStamp + ".dat");

                long datfilesize = datfileinfolder.length();
                final int BLOCK_SIZE = 200 * 1024;

                for (int curBlock = 0;  curBlock < actualBytes.length; curBlock += BLOCK_SIZE) {
                    String toWrite = new String(
                            Arrays.copyOfRange(actualBytes, curBlock, Math.min(curBlock + BLOCK_SIZE, actualBytes.length)));

                     String suffix = "";

                     if (curBlock > 0) {
                         //append underscores other file information and then perform writes
                         suffix =  String.valueOf(curBlock /  BLOCK_SIZE);
                     }    

                     BufferedWriter datfile1 = null;
                     String outdatfile1 = ("07072013\\" + suffix + timeStamp + ".dat");
                     datfile1 = new BufferedWriter(new FileWriter(outdatfile1));


                     datfile1.append(toWrite);
                     datfile1.close(); 

                }

                //long datfilesizeinkb = datfilesize /1024;

                //System.out.println("Size = " + datfilesizeinkb);



             }
        }
    }
     datfile.close();
     xmlfile.close();
  }
}
Jayraj Patel
  • 79
  • 1
  • 10
  • 6
    I don't know the binary format of a pdf file, but I don't think you can concatenate all the bytes of many pdf files and have one file with all the pages, as the files also contain metadata which must be parsed by the program opening the file. – jlordo Jul 10 '13 at 18:41
  • No, Acutally I tried to rename pdf files manually to .dat than open with acrobat its working.. but when i run through my program its not working.. What's your idea in it.. I mean, its better to rename all the files in folder change ext. to .dat and merge it.. is it good idea.. please tell me.. Thanks!! – Jayraj Patel Jul 10 '13 at 18:48
  • 3
    What I'm saying is (that I believe), if you just append the bytes of many pdf files into 1 file, that file won't be a valid pdf file, hence the black screen. – jlordo Jul 10 '13 at 18:51
  • 1
    Exactly, you have to use a tool like [PDFtk](http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/). If you want to do it in Java the [iText](http://itextpdf.com) library will be of assistance. – devconsole Jul 10 '13 at 18:51
  • See [this](http://stackoverflow.com/questions/3585329/how-to-merge-two-pdf-files-into-one-in-java) for example. – jFrenetic Jul 10 '13 at 18:52
  • your code doesn't make a .dat per PDF file. It makes a single one and streams all of the PDFs into that file. It won't work because PDF has headers at the front, you can't just concatenate like that. – mprivat Jul 10 '13 at 18:53
  • I just want to read all the pdf files in folder as a byte array and then wrote them out to the .dat file, one after another. My code is right or wrong?? if its wrong please suggest me something.. Thanks in advanced!! Please let me know!! Thanks!! – Jayraj Patel Jul 10 '13 at 18:57
  • 1
    That's what you are doing. You are reading all pdf files, and dump them into a single dat file. This dat file is not a valid pdf file. (This would only be the case if your folder only contains 1 pdf file). – jlordo Jul 10 '13 at 18:59
  • ohh ok Thank you so much!! Do you guys have any idea. while writing .dat file when .dat file reach 200kb than close that .dat file and create new .dat. continue writing.. every 200kb create new dat file..please help me!! do you guys have any idea how to do that.. i am really confused over there.. Please help me.. Thanks!! – Jayraj Patel Jul 10 '13 at 19:07
  • You can try something yourself, and if it doesn't work, post the code here and describe the specific problem you have. – jlordo Jul 10 '13 at 19:10
  • Acutally I tried, but it didn't worked correctly.. please help me.. while wrting .dat file when it reach 200kb close that file create new file..I edit in code.. its possible?? please help.. Thanks!! – Jayraj Patel Jul 10 '13 at 19:12
  • Hey guys, Please help.. While Writing .dat file when it read 200kb close that .dat file create new..so basically every 200kb create new .dat file.. I already.. tried it I update that code in my post. its not working.. please help me. Thanks in advanced.. please help me!! Thanks!! – Jayraj Patel Jul 10 '13 at 19:24
  • 1
    You have been told what to do. PDF files have a specific format, and you need software like iText to concatenate PDF files. – Gilbert Le Blanc Jul 10 '13 at 19:45

1 Answers1

1

It's unclear from your post and your comments what you're really trying to accomplish. Your original question seemed to be about merging multiple PDF files into a single .dat file, which you expected to be able to open with acrobat.

If that's what you're trying to do, then I suggest using Apache PDFBox and in particular the PDFMergerUtility class. An outline of the code would be like this:

PDFMergerUtility merger = new PDFMergerUtility();
File[] files = folder.listFiles();
for (File file : files) {
    merger.addSource(file);
}

merger.setDestinationFileName("output.pdf");
merger.mergeDocuments();

That should combine your source files into a single large PDF file. You could, of course, use a .dat extension on this file, but I'm not sure why you would do so. The only thing that would accomplish is to break the file extension association so double-clicking the file wouldn't open it.

The second question you were asking was how you break the data into 200KB chunks. I'm unsure why you want do do this. If you do this, you will not (necessarily) be able to open the resulting files in Acrobat. PDF files are pretty specific about their internal format. Partial files will not open. If the goal is to have one output file for each input file, then a simple file copy would accomplish this. If the goal is to take all of these files and merge them into a single stream in 200KB chunks (again, why?), then you might want to consider using a compression library instead. In that case, this answer may get you started.

Community
  • 1
  • 1
Ian McLaird
  • 5,507
  • 2
  • 22
  • 31