1

I need a function which checks if the HttpPostedFileBase is a word document. I don't want to check against file extension because that can be changed by the user.

I tried to read the Header information of the binary data, which starts with PK (for example, PDF files starts with %PDF), but i don't know if i can rely upon that.

[HttpPost]
public ActionResult UploadFile(HttpPostedFileBase file)
{
    string header = null;
    using (MemoryStream ms = new MemoryStream())
    {
        file.InputStream.CopyTo(ms);
        ms.Position = 0;

        using (StreamReader sr = new StreamReader(ms))
        {
            char[] buffer = new char[5];
            sr.Read(buffer, 0, 4);

            header =
                string.Format("{0}{1}{2}{3}{4}", buffer[0], buffer[1], buffer[2], buffer[3], buffer[4]);
        }
    }

    if (header.StartsWith("%PDF"))
    {
        // PDF Document
    }

    if (header.StartsWith("PK"))
    {
        // Microsoft Word Document ?
    }

    return Json(new { }, JsonRequestBehavior.AllowGet);
}
Catalin
  • 11,503
  • 19
  • 74
  • 147
  • How about using Microsfot.Office dlls to check it?. Open[http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.documents.open.aspx] method is available that opens a existing document. That should tell you if it's a valid document? – SridharVenkat Jan 11 '14 at 10:17
  • you may find some help here:http://msdn.microsoft.com/en-us/library/gg615596(v=office.14).aspx – Craig Moore Jan 11 '14 at 10:18
  • http://stackoverflow.com/questions/58510/using-net-how-can-you-find-the-mime-type-of-a-file-based-on-the-file-signature/9435701 – Jeremy Bell Jan 12 '14 at 14:07
  • @SridharVenkat this is a good idea also. I was trying to keep the project clean and not using too many dependencies, but i think this will work – Catalin Jan 13 '14 at 07:09

1 Answers1

0

The first two letters of a word document (DOCX) are PK because a DOCX file is actually a PKZip file; so no, this is not reliable.

The ForensicsWiki page here may help:

http://www.forensicswiki.org/wiki/Word_Document_%28DOC%29

and

http://www.forensicswiki.org/wiki/DOCX

Craig Moore
  • 1,093
  • 1
  • 6
  • 15