1

I'm developing a little program in Java for massively compare blob stored in Oracle Database with file on a remote disk. For file comparison I'm using md5 hash. The strange behaviour of my program is that every time I download the same blod I get different md5 hash. I'm using JDK 1.8 and ojdb6.jar, my Oracle version is: Oracle Database 10g Release 10.2.0.4.0 This is my code for md5 checksum:

public static String getMD5Sum(String filePath) throws Exception{

        MessageDigest md = MessageDigest.getInstance("MD5");

        try (InputStream is = Files.newInputStream(Paths.get(filePath))) {
          DigestInputStream dis = new DigestInputStream(is, md);
          int read = 0;
          do{
              read = dis.read();
          }while(read > -1);
        }
        byte[] digest = md.digest();
        digest.toString();
        String result = "";

        for (int i=0; i < digest.length; i++) {
           result += Integer.toString( ( digest[i] & 0xff ) + 0x100, 16).substring( 1     );
        }
        return result;
    }

And this is How I get the blob from database:

    public static FileBean getBlobAndData() throws Exception
    {
        Connection con = getConnection();
        PreparedStatement pstmt = con.prepareStatement("select 
        s.doc_testo as path,r.img_referto as BINARY from storicoccsfse s, 
        soss.rd_refoasis4 r where s.id_doc_esterno = r.cod_centro || 
        r.cod_scheda and s.id_doc_esterno = ?");
        pstmt.setString(1, "RXC20100010024");
        ResultSet rs = pstmt.executeQuery();
        String path = "";
        FileBean fileBean = new FileBean();
        String fileChecksum = "";
        File file = null;
        while( rs.next() ) {
            path = rs.getString("path");
            Blob blob = rs.getBlob("BINARY");

            long length = blob.length();
            String remotefilename = rs.getString("path");
            InputStream ins = blob.getBinaryStream();
            File targetFile = new File("C:\\Temp\\whatever.pdf");
            OutputStream outStream = new FileOutputStream(targetFile);
            PdfReader pdfreader;
            pdfreader = new PdfReader(ins);
            PdfStamper pdfStamper = new PdfStamper(pdfreader, outStream);
            pdfStamper.close();
            pdfreader.close();
            fileBean.setRemotefilename(remotefilename);

            fileChecksum = getMD5Sum(targetFile.getPath());         
        }
        fileBean.setDigest(fileChecksum);
        return fileBean;
    }

I've tryied different ways to download blob and convert it to pdf but every time I create the checksum I get a different value. With plsql Developer I've opened the blob with Acrobat Reader and the checksum is correct accordly with the file stored on disk. Any Thoughts? Thanks in Advance Andrea

Jason Armstrong
  • 1,058
  • 9
  • 17

1 Answers1

0

The strange behaviour of my program is that every time I download the same blod I get different md5 hash

As it writes it to outStream, PdfStamper is adding extra information to the PDF that changes each time (e.g., a current timestamp). This extra information is being figured into the MD5 hash, causing it to change each time.

Matthew McPeak
  • 17,705
  • 2
  • 27
  • 59