4

Before reading anything else, please take time to read the original thread.

Overview: a .xfdl file is a gzipped .xml file which has then been encoded in base64. I wish to de-encode the .xfdl into xml which I can then modify and then re-encode back into a .xfdl file.

xfdl > xml.gz > xml > xml.gz > xfdl

I have been able to take a .xfdl file and de-encode it from base64 using uudeview:

uudeview -i yourform.xfdl

Then decommpressed it using gunzip

gunzip -S "" < UNKNOWN.001 > yourform-unpacked.xml

The xml produced is 100% readable and looks wonderful. Without modifying the xml then, i should be able to re-compress it using gzip:

gzip yourform-unpacked.xml

Then re-encoded in base-64:

base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl

If my thinking is correct, the original file and the re-encoded file should be equal. If I put yourform.xfdl and yourform_reencoded.xfdl into beyond compare, however, they do not match up. Also, the original file can be viewed in an http://www.grants.gov/help/download_software.jsp#pureedge">.xfdl viewer. The viewer says that the re-encoded xfdl is unreadable.

I have also tried uuenview to re-encode in base64, it also produces the same results. Any help would be appreciated.

Community
  • 1
  • 1
CodingWithoutComments
  • 35,598
  • 21
  • 73
  • 86

8 Answers8

2

As far as I know you cannot find the compression level of an already compressed file. When you are compressing the file you can specify the compression level with -# where the # is from 1 to 9 (1 being the fastest compression and 9 being the most compressed file). In practice you should never compare a compressed file with one that has been extracted and recompressed, slight variations can easily crop up. In your case I would compare the base64 encoded versions instead of the gzip'd versions.

John Downey
  • 13,854
  • 5
  • 37
  • 33
1

You'll need to put the following line at the beginning of the XFDL file:

application/vnd.xfdl; content-encoding="base64-gzip"

After you've generated the base64-encoded file, open it in a text editor and paste the line above on the first line. Ensure that the base64'ed block starts at the beginning of the second line.

Save it and try it in the Viewer! If it still does not work, it may be that the changes that were made to the XML made it non-compliant in some manner. In this case, after the XML has been modified, but before it has been gzipped and base64 encoded, save it with an .xfdl file extension and try opening it with the Viewer tool. The viewer should be able to parse and display the uncompressed / unencoded file if it is in a valid XFDL format.

Markus Safar
  • 6,324
  • 5
  • 28
  • 44
Paradigm
  • 141
  • 2
1

Check these out:

http://www.ourada.org/blog/archives/375

http://www.ourada.org/blog/archives/390

They are in Python, not Ruby, but that should get you pretty close.

And the algorithm is actually for files with header 'application/x-xfdl;content-encoding="asc-gzip"' rather than 'application/vnd.xfdl; content-encoding="base64-gzip"' But the good news is that PureEdge (aka IBM Lotus Forms) will open that format with no problem.

Then to top it off, here's a base64-gzip decode (in Python) so you can make the full round-trip:

with open(filename, 'r') as f:
  header = f.readline()
  if header == 'application/vnd.xfdl; content-encoding="base64-gzip"\n':
    decoded = b''
    for line in f:
      decoded += base64.b64decode(line.encode("ISO-8859-1"))
    xml = zlib.decompress(decoded, zlib.MAX_WBITS + 16)
CrazyPyro
  • 3,257
  • 3
  • 30
  • 39
  • (That's not my blog, BTW.) And credit for that MAX_WBITS magic: http://stackoverflow.com/questions/1838699/how-can-i-decompress-a-gzip-stream-with-zlib – CrazyPyro Feb 16 '11 at 21:49
1

I did this in Java with the help of the Base64 class from http://iharder.net/base64.

I've been working on an application to do form manipulation in Java. I decode the file, create an DOM document from the XML then write it back to file.

My code in Java to read the file looks like this:

public XFDLDocument(String inputFile) 
        throws IOException, 
            ParserConfigurationException,
            SAXException

{
    fileLocation = inputFile;

    try{

        //create file object
        File f = new File(inputFile);
        if(!f.exists()) {
            throw new IOException("Specified File could not be found!");
        }

        //open file stream from file
        FileInputStream fis = new FileInputStream(inputFile);

        //Skip past the MIME header
        fis.skip(FILE_HEADER_BLOCK.length());   

        //Decompress from base 64                   
        Base64.InputStream bis = new Base64.InputStream(fis, 
                Base64.DECODE);

        //UnZIP the resulting stream
        GZIPInputStream gis = new GZIPInputStream(bis);

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        doc = db.parse(gis);

        gis.close();
        bis.close();
        fis.close();

    }
    catch (ParserConfigurationException pce) {
        throw new ParserConfigurationException("Error parsing XFDL from file.");
    }
    catch (SAXException saxe) {
        throw new SAXException("Error parsing XFDL into XML Document.");
    }
}

My code in java looks like this to write the file to disk:

    /**
     * Saves the current document to the specified location
     * @param destination Desired destination for the file.
     * @param asXML True if output needs should be as un-encoded XML not Base64/GZIP
     * @throws IOException File cannot be created at specified location
     * @throws TransformerConfigurationExample
     * @throws TransformerException 
     */
    public void saveFile(String destination, boolean asXML) 
        throws IOException, 
            TransformerConfigurationException, 
            TransformerException  
        {

        BufferedWriter bf = new BufferedWriter(new FileWriter(destination));
        bf.write(FILE_HEADER_BLOCK);
        bf.newLine();
        bf.flush();
        bf.close();

        OutputStream outStream;
        if(!asXML) {
            outStream = new GZIPOutputStream(
                new Base64.OutputStream(
                        new FileOutputStream(destination, true)));
        } else {
            outStream = new FileOutputStream(destination, true);
        }

        Transformer t = TransformerFactory.newInstance().newTransformer();
        t.transform(new DOMSource(doc), new StreamResult(outStream));

        outStream.flush();
        outStream.close();      
    }

Hope that helps.

MrWizard54
  • 515
  • 2
  • 8
  • 20
1

I've been working on something like that and this should work for php. You must have a writable tmp folder and the php file be named example.php!

    <?php
    function gzdecode($data) {
        $len = strlen($data);
        if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {
            echo "FILE NOT GZIP FORMAT";
            return null;  // Not GZIP format (See RFC 1952)
        }
        $method = ord(substr($data,2,1));  // Compression method
        $flags  = ord(substr($data,3,1));  // Flags
        if ($flags & 31 != $flags) {
            // Reserved bits are set -- NOT ALLOWED by RFC 1952
            echo "RESERVED BITS ARE SET. VERY BAD";
            return null;
        }
        // NOTE: $mtime may be negative (PHP integer limitations)
        $mtime = unpack("V", substr($data,4,4));
        $mtime = $mtime[1];
        $xfl   = substr($data,8,1);
        $os    = substr($data,8,1);
        $headerlen = 10;
        $extralen  = 0;
        $extra     = "";
        if ($flags & 4) {
            // 2-byte length prefixed EXTRA data in header
            if ($len - $headerlen - 2 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $extralen = unpack("v",substr($data,8,2));
            $extralen = $extralen[1];
            if ($len - $headerlen - 2 - $extralen < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $extra = substr($data,10,$extralen);
            $headerlen += 2 + $extralen;
        }

        $filenamelen = 0;
        $filename = "";
        if ($flags & 8) {
            // C-style string file NAME data in header
            if ($len - $headerlen - 1 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $filenamelen = strpos(substr($data,8+$extralen),chr(0));
            if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $filename = substr($data,$headerlen,$filenamelen);
            $headerlen += $filenamelen + 1;
        }

        $commentlen = 0;
        $comment = "";
        if ($flags & 16) {
            // C-style string COMMENT data in header
            if ($len - $headerlen - 1 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));
            if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {
                return false;    // Invalid header format
                echo "INVALID FORMAT";
            }
            $comment = substr($data,$headerlen,$commentlen);
            $headerlen += $commentlen + 1;
        }

        $headercrc = "";
        if ($flags & 1) {
            // 2-bytes (lowest order) of CRC32 on header present
            if ($len - $headerlen - 2 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;
            $headercrc = unpack("v", substr($data,$headerlen,2));
            $headercrc = $headercrc[1];
            if ($headercrc != $calccrc) {
                echo "BAD CRC";
                return false;    // Bad header CRC
            }
            $headerlen += 2;
        }

        // GZIP FOOTER - These be negative due to PHP's limitations
        $datacrc = unpack("V",substr($data,-8,4));
        $datacrc = $datacrc[1];
        $isize = unpack("V",substr($data,-4));
        $isize = $isize[1];

        // Perform the decompression:
        $bodylen = $len-$headerlen-8;
        if ($bodylen < 1) {
            // This should never happen - IMPLEMENTATION BUG!
            echo "BIG OOPS";
            return null;
        }
        $body = substr($data,$headerlen,$bodylen);
        $data = "";
        if ($bodylen > 0) {
            switch ($method) {
                case 8:
                    // Currently the only supported compression method:
                    $data = gzinflate($body);
                    break;
                default:
                    // Unknown compression method
                    echo "UNKNOWN COMPRESSION METHOD";
                return false;
            }
        } else {
            // I'm not sure if zero-byte body content is allowed.
            // Allow it for now...  Do nothing...
            echo "ITS EMPTY";
        }

        // Verifiy decompressed size and CRC32:
        // NOTE: This may fail with large data sizes depending on how
        //       PHP's integer limitations affect strlen() since $isize
        //       may be negative for large sizes.
        if ($isize != strlen($data) || crc32($data) != $datacrc) {
            // Bad format!  Length or CRC doesn't match!
            echo "LENGTH OR CRC DO NOT MATCH";
            return false;
        }
        return $data;
    }
    echo "<html><head></head><body>";
    if (empty($_REQUEST['upload'])) {
        echo <<<_END
    <form enctype="multipart/form-data" action="example.php" method="POST">
    <input type="hidden" name="MAX_FILE_SIZE" value="100000" />
    <table>
    <th>
    <input name="uploadedfile" type="file" />
    </th>
    <tr>
    <td><input type="submit" name="upload" value="Convert File" /></td>
    </tr>
    </table>
    </form>
    _END;

    }
    if (!empty($_REQUEST['upload'])) {
        $file           = "tmp/" . $_FILES['uploadedfile']['name'];
        $orgfile        = $_FILES['uploadedfile']['name'];
        $name           = str_replace(".xfdl", "", $orgfile);
        $convertedfile  = "tmp/" . $name . ".xml";
        $compressedfile = "tmp/" . $name . ".gz";
        $finalfile      = "tmp/" . $name . "new.xfdl";
        $target_path    = "tmp/";
        $target_path    = $target_path . basename($_FILES['uploadedfile']['name']);
        if (move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path)) {
        } else {
            echo "There was an error uploading the file, please try again!";
        }
        $firstline      = "application/vnd.xfdl; content-encoding=\"base64-gzip\"\n";
        $data           = file($file);
        $data           = array_slice($data, 1);
        $raw            = implode($data);
        $decoded        = base64_decode($raw);
        $decompressed   = gzdecode($decoded);
        $compressed     = gzencode($decompressed);
        $encoded        = base64_encode($compressed);
        $decoded2       = base64_decode($encoded);
        $decompressed2  = gzdecode($decoded2);
        $header         = bin2hex(substr($decoded, 0, 10));
        $tail           = bin2hex(substr($decoded, -8));
        $header2        = bin2hex(substr($compressed, 0, 10));
        $tail2          = bin2hex(substr($compressed, -8));
        $header3        = bin2hex(substr($decoded2, 0, 10));
        $tail3          = bin2hex(substr($decoded2, -8));
        $filehandle     = fopen($compressedfile, 'w');
        fwrite($filehandle, $decoded);
        fclose($filehandle);
        $filehandle     = fopen($convertedfile, 'w');
        fwrite($filehandle, $decompressed);
        fclose($filehandle);
        $filehandle     = fopen($finalfile, 'w');
        fwrite($filehandle, $firstline);
        fwrite($filehandle, $encoded);
        fclose($filehandle);
        echo "<center>";
        echo "<table style='text-align:center' >";
        echo "<tr><th>Stage 1</th>";
        echo "<th>Stage 2</th>";
        echo "<th>Stage 3</th></tr>";
        echo "<tr><td>RAW DATA -></td><td>DECODED DATA -></td><td>UNCOMPRESSED DATA -></td></tr>";
        echo "<tr><td>LENGTH: ".strlen($raw)."</td>";
        echo "<td>LENGTH: ".strlen($decoded)."</td>";
        echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>";
        echo "<tr><td><a href='tmp/".$orgfile."'/>ORIGINAL</a></td><td>GZIP HEADER:".$header."</td><td><a href='".$convertedfile."'/>XML CONVERTED</a></td></tr>";
        echo "<tr><td></td><td>GZIP TAIL:".$tail."</td><td></td></tr>";
        echo "<tr><td><textarea cols='30' rows='20'>" . $raw . "</textarea></td>";
        echo "<td><textarea cols='30' rows='20'>" . $decoded . "</textarea></td>";
        echo "<td><textarea cols='30' rows='20'>" . $decompressed . "</textarea></td></tr>";
        echo "<tr><th>Stage 6</th>";
        echo "<th>Stage 5</th>";
        echo "<th>Stage 4</th></tr>";
        echo "<tr><td>ENCODED DATA <-</td><td>COMPRESSED DATA <-</td><td>UNCOMPRESSED DATA <-</td></tr>";
        echo "<tr><td>LENGTH: ".strlen($encoded)."</td>";
        echo "<td>LENGTH: ".strlen($compressed)."</td>";
        echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>";
        echo "<tr><td></td><td>GZIP HEADER:".$header2."</td><td></td></tr>";
        echo "<tr><td></td><td>GZIP TAIL:".$tail2."</td><td></td></tr>";
        echo "<tr><td><a href='".$finalfile."'/>FINAL FILE</a></td><td><a href='".$compressedfile."'/>RE-COMPRESSED FILE</a></td><td></td></tr>";
        echo "<tr><td><textarea cols='30' rows='20'>" . $encoded . "</textarea></td>";
        echo "<td><textarea cols='30' rows='20'>" . $compressed . "</textarea></td>";
        echo "<td><textarea cols='30' rows='20'>" . $decompressed  . "</textarea></td></tr>";
        echo "</table>";
        echo "</center>";
    }
    echo "</body></html>";
    ?>
0

gzip will put the filename in the header of file, so that a gzipped file vary in length depending on the filename of the uncompressed file.

If gzip acts on a stream, the filename is omitted and the file is a bit shorter, so the following should work:

gzip yourform-unpacked.xml.gz

Then re-encoded in base-64: base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl

maybe this will produce a file of the same length

Alex Lehmann
  • 668
  • 1
  • 6
  • 11
0

Different implementations of the gzip algorithm will always produce slightly different but still correct files, also the compression level of the original file may be different then what you are running it at.

John Downey
  • 13,854
  • 5
  • 37
  • 33
0

Interesting, I'll give it a shot. The variations aren't slight, however. The newly encoded file is longer and when comparing the binary of the before and after, the data hardly matches up at all.

Before (the first three lines)

H4sIAAAAAAAAC+19eZOiyNb3/34K3r4RT/WEU40ssvTtrhuIuKK44Bo3YoJdFAFZ3D79C6hVVhUq
dsnUVN/qmIkSOLlwlt/JPCfJ/PGf9dwAlorj6pb58wv0LfcFUEzJknVT+/ml2uXuCSJP3kNf/vOQ
+TEsFVkgoDfdn18mnmd/B8HVavWt5TsKI2vKN8magyENiH3Lf9kRfpd817PmF+jpiOhQRFZcXTMV

After (the first three lines):

H4sICJ/YnEgAAzEyNDQ2LTExNjk2NzUueGZkbC54bWwA7D1pU+JK19/9FV2+H5wpByEhJMRH
uRUgCMom4DBYt2oqkAZyDQlmQZ1f/3YSNqGzKT3oDH6RdE4vOXuf08vFP88TFcygYSq6dnlM
naWOAdQGuqxoo8vjSruRyGYzfII6/id3dPGjVKwCBK+Zl8djy5qeJ5NPT09nTduAojyCZwN9

As you can see H4SI match up, then after that it's pandemonium.

Markus Safar
  • 6,324
  • 5
  • 28
  • 44
CodingWithoutComments
  • 35,598
  • 21
  • 73
  • 86
  • But unless you use exactly the same gzip implementation, you would only ever expect the H4sI to be the same. "Pandemonium" is normal :-) – Rory Alsop Mar 29 '11 at 08:04