Most languages lack this, so I would be very surprised to find it in OCaml. Apache does it with a mime.types
file - you can look there for hints. This is the most usual way - a huge table which maps extensions into mimetypes. You can implement it in OCaml easily:
let mimetype_of_extension = function
| "txt" | "log" -> "text/plain"
| "html" | "htm" -> "text/html"
| "zip" | "application/zip"
...
Another way is to look at the file contents, but then you basically need to know about the various file formats.
That said, it does not help you much, since source files of all languages are normally treated as text/plain
. They are not distinguishable by mimetype; and thus I really have no idea what your get_language_from_mime_type
function does.
However, filename extensions of various source files are more-or-less standardised, so if you know the extension, you will know the language. Getting the extension is as simple as ripping whatever follows the last period from the filename.
let extension_of_filename filename =
let pos = (String.rindex filename '.') + 1 in
let len = String.length filename in
let ext = String.create (len - pos) in
String.blit filename pos ext 0 (len - pos);
ext;;
Well, okay, simple in any language except Brainfuck and OCaml, at least. After that, it's easy - "c" is a C program, as is "h"; "ml" is OCaml; etc.