0

We are developing a Java based application and use Hibernate for DAO. I have an XML data store as Clob in the database. With Hibernate Clob object and parse it into a String. Then I HASH(SHA-256) it.

The Clob object has 2 method.

  1. getAsciiStream() returns InputSteam
  2. getCharacterStream() returns Reader

The parser parses the InputStream or Reader into a exactly same String. However the Hash generated from each of them are different. Why?

Context: We currently are using getAsciiStream() and it limits many special characters. So we have to change it getCharacterStream(). However, our application (it does hash comparison for business reasons) and customers will be affected with the change.

Example XML:
<application>
    <header>
        <subtitle>WHAT EVER</subtitle>
        <page_header>WHAT EVER</page_header>
    </header>
    <form>
        <application_form_name>WHAT EVER</application_form_name>
        <section>
            <section_name>WHAT EVER</section_name>
            <question>
                <question_text>How is this applicant to something?</question_text>
                <answer>WHAT EVER</answer>
            </question>
            <question>
                <question_text>What is something?</question_text>
                <answer>WHAT EVER</answer>
            </question>
            <question>
                <question_text>Type of Customer</question_text>
                <answer>Organization</answer>
            </question>
            <question>
                <question_text>Full legal name of the applicant:</question_text>
                <answer/>
            </question>
            <question>
                <question_text>Legal Name</question_text>
                <answer>WHAT EVER</answer>
            </question>
            <question>
                <question_text>WHAT EVER</question_text>
                <answer/>
            </question>
            <question>
                <question_text>WHAT EVER</question_text>
                <answer/>
            </question>
            <question>
                <question_text>WHAT EVER</question_text>
                <answer/>
            </question>
            <question>
                <question_text>WHAT EVER</question_text>
                <answer/>
            </question>
        </section>
    </form>
</application>
nishu
  • 1,493
  • 11
  • 26
Rama
  • 1
  • 1
    Possibly: Different encodings are being used and you didn't see slight differences in the resulting String objects. – Jim Garrison Jul 23 '14 at 16:45
  • Hashing is done on bytes, not on characters. If you're hashing a `String`, it's getting converted back into bytes -- possibly in some different way than the way it was originally converted from. – Louis Wasserman Jul 23 '14 at 16:55

1 Answers1

0

AFAIK getAsciiStream is deprecated

Also please provide your hash-calculation code for both (getAsciiStream/getCharacterStream) versions - the whole part - how do You read from DB and how do you feed data to MessageDigest. It's much more informative than your XML sample.

Btw. Since XML has non-significant white spaces where are special (non-significant white space ignoring) way how to calculate message digest on XML Data:

Btw. You still doing it wrong. You can store ANY characters in XML with plain latin1 encoding just using entities. So you can keep you XML reading code AS-IS but change your XML-to-String code (specify latin1 encoding for XmlWriter).

Community
  • 1
  • 1
dimzon
  • 414
  • 4
  • 13