0

I have a project about creating file uploading website for my university using PHP. For every uploaded file, the website must check if the uploaded file is corrupted or not. I've been searching for and found nothing.

Edward A
  • 2,291
  • 2
  • 18
  • 31
user2382730
  • 37
  • 1
  • 6

1 Answers1

4

For Checking whether PDF file is corrupted or not.

Read first five byte of the PDF file. If the string read is %PDF- , then file is not corrupted else corrupted.

Here is the working code:

<?php
$fp = fopen('mypdffile.pdf', 'r');

// move to the 0th byte
fseek($fp, 0);
$data = fread($fp, 5);   // read 5 bytes from byte 0
if(strcmp($data,"%PDF-")==0)
{
  echo "The PDF File is not Corrupted.";
}
else
{
  echo "The PDF File is  Corrupted.";
} 
fclose($fp);
?>

Explanation: Open any non-corrupted file with notepad++, you will notice that the first five byte of the opened file equal following sub-string "%PDF-". This is nothing but header format for a valid PDF file and we can take it advantage to test whether the file is corrupted or not.

enter image description here

For Checking whether .docx file is corrupted or not

DOCX files are in ZIP format, in which the first two bytes are the letters PK (after ZIP's creator, Phil Katz).

enter code here

So modify above code:

 fseek($fp, 0);
$data = fread($fp, 2);   // read 2 bytes from byte 0
if(strcmp($data,"PK")==0)
{
  echo "The docx File is not Corrupted.";
}
else
{
  echo "The docx File is  Corrupted.";
} 
Ritesh Kumar Gupta
  • 5,055
  • 7
  • 45
  • 71
  • **I don't have idea about checking whether .doc file is Corrupted or not?** – Ritesh Kumar Gupta May 14 '13 at 18:28
  • 3
    Just because it has a valid header doesn't mean it's not corrupted, although this is a good start. – Eric Petroelje May 14 '13 at 18:31
  • The only thing you're checking is whether or not the files are really pdf/doc files. Just as Eric said, having a fine header doesn't mean the file isn't corrupted. His best bet would be to find c++ libraries that can open doc/pdf files or a program that can do just that and check if the file is corrupted or not. That's where `system` or cgi comes into play. `acroread` is a good one for checking pdfs as pointed out [here](http://stackoverflow.com/a/10344051/1312672). – Edward A May 14 '13 at 19:29
  • Thanks very much for the answer. I will give it a try. Let me know if you know how to check for the corrupted doc file. – user2382730 May 14 '13 at 19:40
  • @all: Above answer will work fine for majority of cases. I did n't say because it has a valid header doesn't mean it's not corrupted, is the best way to check. However, even if you are using any other/best method, my above answer can be mingled as step one in that method. – Ritesh Kumar Gupta May 15 '13 at 02:40