5

I'm building a simple file upload/file download functionality into my database. The only complicated part is that all files need to be encrypted using my fancy-shmancy encryption methods.

So what I do is make an SQL entry that stores things like: id_file, filename, extension, size, dateadded, etc

Then once I've got the id_file I take the file contents, encrypt them, then save the contents to my server as [id_file].txt.

Then here's the code for downloading the file again:

header("Pragma: public");
header('Content-Disposition: attachment;filename="'.$file['name'].'.'.$file['extension'].'"');
header('Cache-Control: max-age=0');

echo someFunctionIMadeForGettingAndDecryptingFileContents($_GET['id_file']);

exit;

Really simple stuff and works PERFECTLY for all file types EXCEPT .docx and .xlsx. When downloading .docx or .xlsx files Office gives me an error saying "Word found unreadable content in "NAME OF FILE". Do you want to recover the contents of this document? If you trust the source... bla bla" I then click 'Yes'. It thinks a bit, and the file opens up just fine. But obviously I can't have my clients using this if they're going to get that error every time.

The code I've written works perfectly for all other file types. Even .doc, .xls, and .zip files work fine.

My first thought was to look at the headers. I've tried all sorts of solutions like the ones listed here:

why my downloaded file is alwayes damaged or corrupted? PHP downloading excel file becomes corrupt

Those didn't work.

I know an issue can be with extra padding or white space being added to the file. But if I upload a .txt file and then download it again... I can see that there isn't anything extra being added.

If I MD5 the original file (good.docx) and the downloaded version of the original file (bad.docx), the hashes ARE different.

If I change good.docx to good.zip and unzip the archive. Then do the same for bad.docx. Then MD5 both directories, the hashes are the SAME. And I've hashed each file inside good.zip and bad.zip and each file hash is the same.

Also to note, elsewhere on my server I use PHPWord and PHPExcel to generate Office files dynamically and those files all download great. The headers/code I use for PHPExcel are:

header("Pragma: public");
header('Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet');
header('Content-Disposition: attachment;filename="'.$filename.'.xlsx"');
header('Cache-Control: max-age=0');
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel2007');
$objWriter->save('php://output');
exit;

(Yes, I've tried using the "Content-Type" header on my other code above but that didn't help.)

I've also tried saving the file on my server, downloading it, and opening it. I get the same error when going through that process. Here is the code I used to do that:

$f=fopen("/myPath/temp.docx","w");
fwrite($f,someFunctionIMadeForGettingAndDecryptingFileContents($_GET['id_file']));
fclose($f);
exit;

I've tried creating an empty Word file called "blank.docx". Then made it so instead of the function save a new file.... it replaces the contents of blank.docx with the decrypted file contents. But when downloading blank.docx after that process I get all the same... an error but it eventually opens. None of the file properties (like Template: Normal.dotm) that were originally on blank.docx are there on the served modified blank.docx.

I'm using Office 2007

UPDATE

Here is a link to download the good (original) version of a file: http://empowerdb.org/good.docx

And here is a link to download the bad (processed) version of the file: http://empowerdb.org/bad.docx

SOLUTION

As Mr. Llama pointed out below, my encryption function was lopping off some extra null bytes. But it turned out the culprit wasn't as obvious as you'd think. Here's my encryption:

trim(base64_encode(IV.mcrypt_encrypt(MCRYPT_RIJNDAEL_128,ENCKEY,$contents,MCRYPT_MODE_CBC,IV)))

The problem wasn't with trim() or with base64_encode(). It was with the mcrypt function. The way I solved this was before passing my file contents to get encrypted I did another base64_encode(). So like this...

$file_contents_encrypted=base64_encode(myEncryptionFunction($file_contents));

And of course the reverse upon decryption.

The base64_encode is technically being run twice. But I can see how it's needed to be run in this case BEFORE mcrypt because of the unique zip like format of .docx and .xlsx

Community
  • 1
  • 1
rgbflawed
  • 1,957
  • 1
  • 22
  • 28
  • Have You tried to set `Content-Transfer-Encoding` to `base64` and encode binery data with `base64` algorythm. – WBAR Jul 15 '14 at 13:36
  • I had tried Content-Transfer-Encoding as binary . I will try base64 now instead. How would you encode binary data with base64 algorithm? – rgbflawed Jul 15 '14 at 13:38
  • http://php.net/manual/en/function.base64-encode.php – WBAR Jul 15 '14 at 13:41
  • That didn't help. Now the file can't be read at all. – rgbflawed Jul 15 '14 at 13:44
  • 1
    Maybe is something with the date? If md5 checksums are equal it is something else.. Maybe on each save docx creates hash based on the date when this file was saved? – nacholibre Jul 15 '14 at 14:04
  • Yes... this sounds like the right direction. Good.docx has lots of information for the file properties. Things like the template being normal.dotm and the program being "Microsoft Office Word". But Bad.docx has nothing. I can't edit those properties in Bad.docx, though, to test to see if those are the missing parts! – rgbflawed Jul 15 '14 at 14:18
  • 1
    You said the MD5 sums were different after downloading. Can you upload the good/bad samples for analysis? – Mr. Llama Jul 15 '14 at 15:08
  • What happens when you try to verify the character encoding of the downloaded file? Does THAT at least match with the old file? It would help if you could give us a diff of the two files – Christopher Wirt Jul 15 '14 at 15:08
  • @Mr.Llama I updated the question with links to good.docx and bad.docx. Christopher the only way I know how to verify encoding is to open the file in Notepad++. both good and bad are ANSI – rgbflawed Jul 15 '14 at 15:19

1 Answers1

9

Your decryption function is lopping off null bytes at the end of files.

The good.docx file ends with four 0x00 bytes, while the bad.docx file ends with none. Aside from those missing bytes, the files are identical.

$ wc -c good.docx
25123 good.docx

$ wc -c bad.docx
25119 bad.docx

$ tail -c 32 good.docx | od -x
0000000 6666 6365 7374 782e 6c6d 4b50 0605 0000
0000020 0000 0010 0010 041c 0000 5df1 0000 0000

$ tail -c 32 bad.docx | od -x
0000000 7469 4568 6666 6365 7374 782e 6c6d 4b50
0000020 0605 0000 0000 0010 0010 041c 0000 5df1

If you skip the last four bytes of good.docx, the md5 sums match exactly:

$ head -c -4 good.docx | md5sum
fbd32fbcc02d62dfd8bd39d390252a4b *-

$ cat bad.docx | md5sum
fbd32fbcc02d62dfd8bd39d390252a4b *-
Mr. Llama
  • 20,202
  • 2
  • 62
  • 115
  • Yes... this sounds possible. Both my encryption and decryption functions use trim(). I will try taking that out now... – rgbflawed Jul 15 '14 at 15:34
  • removing trim() from both encrypt and decrypt didn't help. But I'm pretty sure this is the right road to be going down (and will give you the answer). Can you just answer these last two bits, though... 1) What command line are you typing into to get these readings? 2) Is there any way I can manually add those four null bytes back onto bad.docx to see if it opens properly? – rgbflawed Jul 15 '14 at 15:40
  • 1
    I'm using Cygwin on Windows, but the commands (which appear after the `$`) will work just fine on any Unix system. If you don't have access to either, you can download a hex editor like [HxD](http://mh-nexus.de/en/hxd/) to manually add the missing bytes. To manually add the bytes using Cygwin/Unix, the following will work: `head -c 4 /dev/zero >> bad.docx` – Mr. Llama Jul 15 '14 at 15:50
  • 1
    I've got it working!!! I'll update my question with the solution in case anyone here is curious or this happens to anyone else. Thanks again Mr. Llama! I'll award you the bounty when I'm able to. – rgbflawed Jul 15 '14 at 15:58