81

How does one reliably determine a file's type? File extension analysis is not acceptable. There must be a rubyesque tool similar to the UNIX file(1) command?

This is regarding MIME or content type, not file system classifications, such as directory, file, or socket.

13 Answers13

58

There is a ruby binding to libmagic that does what you need. It is available as a gem named ruby-filemagic:

gem install ruby-filemagic

Require libmagic-dev.

The documentation seems a little thin, but this should get you started:

$ irb 
irb(main):001:0> require 'filemagic' 
=> true
irb(main):002:0> fm = FileMagic.new
=> #<FileMagic:0x7fd4afb0>
irb(main):003:0> fm.file('foo.zip') 
=> "Zip archive data, at least v2.0 to extract"
irb(main):004:0> 
Undo
  • 25,519
  • 37
  • 106
  • 129
Martin Carpenter
  • 5,893
  • 1
  • 28
  • 32
35

If you're on a Unix machine try this:

mimetype = `file -Ib #{path}`.gsub(/\n/,"")

I'm not aware of any pure Ruby solutions that work as reliably as 'file'.

Edited to add: depending what OS you are running you may need to use 'i' instead of 'I' to get file to return a mime-type.

Patrick Ritchie
  • 2,007
  • 1
  • 17
  • 20
  • 18
    To prevent nasty hackery, try using popen: `IO.popen(["file", "--brief", "--mime-type", path], in: :close, err: :close).read.chomp` – sj26 May 22 '12 at 02:11
  • Yup, this or the `cocaine` gem. – maletor Feb 10 '14 at 21:44
  • 8
    @sj26 Each time I call `popen`, I get a zombie process because the IO object is not closed. To fix that, use a block: `IO.popen(["file", "--brief", "--mime-type", path], in: :close, err: :close) { |io| io.read.chomp }` – Andrew Apr 17 '14 at 21:31
  • @sj26 what do you mean by "nasty hackery"? Are backticks considered harmful? – Pete May 02 '16 at 22:06
  • 1
    @Pete interpolating potentially user supplied content into a command string like backticks is a potential security vulnerability. Using popen with an array of arguments prevents this category of exploit. :-) – sj26 May 07 '16 at 02:48
  • 1
    Excellent point about zombies! `IO.popen(["file", "--brief", "--mime-type", path], &:read).chomp` works, too. – sj26 May 07 '16 at 02:50
14

I found shelling out to be the most reliable. For compatibility on both Mac OS X and Ubuntu Linux I used:

file --mime -b myvideo.mp4
video/mp4; charset=binary

Ubuntu also prints video codec information if it can which is pretty cool:

file -b myvideo.mp4
ISO Media, MPEG v4 system, version 2

jamiew
  • 816
  • 7
  • 12
10

You can use this reliable method base on the magic header of the file :

def get_image_extension(local_file_path)
  png = Regexp.new("\x89PNG".force_encoding("binary"))
  jpg = Regexp.new("\xff\xd8\xff\xe0\x00\x10JFIF".force_encoding("binary"))
  jpg2 = Regexp.new("\xff\xd8\xff\xe1(.*){2}Exif".force_encoding("binary"))
  case IO.read(local_file_path, 10)
  when /^GIF8/
    'gif'
  when /^#{png}/
    'png'
  when /^#{jpg}/
    'jpg'
  when /^#{jpg2}/
    'jpg'
  else
    mime_type = `file #{local_file_path} --mime-type`.gsub("\n", '') # Works on linux and mac
    raise UnprocessableEntity, "unknown file type" if !mime_type
    mime_type.split(':')[1].split('/')[1].gsub('x-', '').gsub(/jpeg/, 'jpg').gsub(/text/, 'txt').gsub(/x-/, '')
  end  
end
Alain Beauvois
  • 5,896
  • 3
  • 44
  • 26
10

This was added as a comment on this answer but should really be its own answer:

path = # path to your file

IO.popen(
  ["file", "--brief", "--mime-type", path],
  in: :close, err: :close
) { |io| io.read.chomp }

I can confirm that it worked for me.

Jason Swett
  • 43,526
  • 67
  • 220
  • 351
  • 2
    This works perfectly with the added bonus of not needing to add and maintain yet another gem. – Steven Hirlston Nov 14 '19 at 18:11
  • This works but it trusts the extension as far as I know. It is probably good in most cases but using the magic number of the file is safer. In most cases it is obviously not a problem. The only reason why I mention this is because I just had to fix a bug where a file had ".jpeg" extension but was really a Gif. It was a pain to debug because most methods use the extension. – Mig Aug 25 '21 at 12:40
7

If you're using the File class, you can augment it with the following functions based on @PatrickRichie's answer:

class File
    def mime_type
        `file --brief --mime-type #{self.path}`.strip
    end

    def charset
        `file --brief --mime #{self.path}`.split(';').second.split('=').second.strip
    end
end

And, if you're using Ruby on Rails, you can drop this into config/initializers/file.rb and have available throughout your project.

spyle
  • 1,960
  • 26
  • 23
5

For those who came here by the search engine, a modern approach to find the MimeType in pure ruby is to use the mimemagic gem.

require 'mimemagic'

MimeMagic.by_magic(File.open('tux.jpg')).type # => "image/jpeg" 

If you feel that is safe to use only the file extension, then you can use the mime-types gem:

MIME::Types.type_for('tux.jpg') => [#<MIME::Type: image/jpeg>]
Paulo Fidalgo
  • 21,709
  • 7
  • 99
  • 115
2

You could give shared-mime a try (gem install shared-mime-info). Requires the use ofthe Freedesktop shared-mime-info library, but does both filename/extension checks as well as "magic" checks... tried giving it a whirl myself just now but I don't have the freedesktop shared-mime-info database installed and have to do "real work," unfortunately, but it might be what you're looking for.

Chris Ingrassia
  • 534
  • 4
  • 6
1

Pure Ruby solution using magic bytes and returning a symbol for the matching type:

https://github.com/SixArm/sixarm_ruby_magic_number_type

I wrote it, so if you have suggestions, let me know.

joelparkerhenderson
  • 34,808
  • 19
  • 98
  • 119
1

I recently found mimetype-fu.

It seems to be the easiest reliable solution to get a file's MIME type.

The only caveat is that on a Windows machine it only uses the file extension, whereas on *Nix based systems it works great.

Pranav 웃
  • 8,469
  • 6
  • 38
  • 48
0

The best I found so far:

http://bogomips.org/mahoro.git/

knoopx
  • 17,089
  • 7
  • 36
  • 41
-1

The ruby gem is well. mime-types for ruby

Qianjigui
  • 699
  • 5
  • 5
-3

You could give a go with MIME::Types for Ruby.

This library allows for the identification of a file’s likely MIME content type. The identification of MIME content type is based on a file’s filename extensions.

Seki
  • 11,135
  • 7
  • 46
  • 70
Bobby Jack
  • 15,689
  • 15
  • 65
  • 97
  • 6
    From Readme.txt: "The identification of MIME content type is based on a file‘s filename extensions". OP explicitly requested a method based on content analysis, not filename extension. – Martin Carpenter May 23 '09 at 14:37