0

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.

I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?

Ramiz Raja
  • 300
  • 6
  • 17
  • 4
    Why reinvent the wheel? Just call [file](http://linux.die.net/man/1/file) (or check its source code). – m0skit0 Feb 05 '13 at 12:40
  • There's the easy way and the hard way. The easy way is to call `file` with a `system()` call or just guess based on the file extension. The hard way is to reinvent `file`: parse the file and determine its type based on the content. – netcoder Feb 05 '13 at 12:46
  • But i have find file type in c/c++. Can you give a code example using system( ) call ? – Ramiz Raja Feb 05 '13 at 12:47
  • This question suggests using GLib/GIO: http://stackoverflow.com/questions/1629172/how-do-you-get-the-icon-mime-type-and-application-associated-with-a-file-in-th – acraig5075 Feb 05 '13 at 12:48
  • 1
    No i am not assuming that every file will have an extension because in Linux extension has no meaning. So file type is identified by a magic number stored at the start of file contents – Ramiz Raja Feb 05 '13 at 12:51
  • @RamizRaja Is this for school or some kind of training? Then I think you're supposed to inspect the name, it sounds too complicated otherwise. – unwind Feb 05 '13 at 13:02
  • @unwind this is part of my final year project. I can find a file name which is stored in dentry structure but don't know how to read magic number of a file – Ramiz Raja Feb 05 '13 at 13:21
  • > Actually i have to make an application which blocks access to files of a particular extension. This strikes me as futile. If any app kept me from reading a PDF file by testing whether the file name ends in ".pdf" I'd simply rename it to `file_i_want_to_access` and open it. You are trying to solve a problem completely the wrong way. Hooking into the kernel open call is just insane, IMHO. Why not use file permissions? Or jails? Or some other funky technology? It would help if you told us your *actual problem* instead of some obscure syscall magic you want to perform to solve a subsubsubproblem – Jens Feb 05 '13 at 12:52
  • i agree but Linux does not identify a file by extension in its instead it uses a magic number stored in starting bytes of file to identify a file type. So even you rename a file Linux will still identify its correct type. – Ramiz Raja Feb 05 '13 at 13:14
  • 1
    @RamizRaja The "magic number" is not a accurate description, it's shorthand. It means data from the file itself. For instance, PNG images files start with [an 8-byte header](http://en.wikipedia.org/wiki/Portable_Network_Graphics#File_header) which you can of course look for quite easily. – unwind Feb 05 '13 at 13:26
  • @unwind actually i don't know how to how to read magic number in kernel space with file name and current pointer in hand. Can you give a code example ? – Ramiz Raja Feb 05 '13 at 13:33

2 Answers2

6

Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.

Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.

There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.

However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.

In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/

unwind
  • 391,730
  • 64
  • 469
  • 606
  • i need to support any five file types. – Ramiz Raja Feb 05 '13 at 12:52
  • what you actually mean by re-create the test ? – Ramiz Raja Feb 05 '13 at 13:39
  • 1
    @Ramiz Raja There are no magic numbers, so what you (and the file tool) need to do to identify the file type, is to read the first piece of data in the file, and guess what kind of file it is. That's the test you must create in the kernel - read a piece of the file, and try to guess what kind of file type it is. – nos Feb 06 '13 at 10:07
  • OK...then if i limit the number of extensions/types then it will work. Can you give any code example that how to read first piece of data in file. As far i know that each file system registers its functions with VFS and VFS actually call those functions for reading/writing. Then how can i read first piece of file data?? please give a code example if you can or suggest some link. – Ramiz Raja Feb 06 '13 at 17:11
2

Actually i have to make an application which blocks access to files of a particular extension.

that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.

The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

Lie Ryan
  • 62,238
  • 13
  • 100
  • 144
  • It's just a flaw in the way the question was worded. I saw that too so I specifically asked the OP about that in the comments and OP acknowledges that: `No i am not assuming that every file will have an extension` – Mike Feb 05 '13 at 13:10