2

Once I have a File instance, I would like to check if it matches a file format and extend that instance with corresponding methods:

module MP3
  def self.extended(base)
    raise "Can only extend a File" unless base.is_a?(File)
    raise "Incorrect file format" unless is_mp3?(base)
  end

  def self.is_mp3?(file)
    # Full metadata check if it is a MP3 format
  end

  def year
    # Extract year from metadata
  end
end

song = File.new("song.mp3")
if MP3.is_mp3?(song)
  song.extend(MP3)
  puts song.ctime # Original File method
  puts song.year  # Extended MP3 method
end

picture = File.new("picture.jpg")
MP3.is_mp3?(picture) #=> False
picture.extend(MP3)  #=> raise "Incorrect file format"

I guess that is not conventional but my needs are:

  • Be able to handle several file formats.
  • Open a file before knowing its format.
  • Re-use the same File instance without having to create a new object. (see below)
  • Have both original File methods and format specific methods in the same object.
  • Check the file format is correct before adding corresponding methods.

Is this approach correct?

This question is a follow-up of a previous question.

I want to extend the existing File instance instead of creating a new instance because I am using a wrapper for File, that holds the whole file in RAM (read from a tape drive that does not allow sequential access).

Victor
  • 1,680
  • 3
  • 22
  • 40

1 Answers1

2

What you're proposing puts too much of the logic of choosing which class to use into the caller's code. Every time you add a new file type, you need to change everywhere your code is used.

Instead, use a Factory pattern. Write a class (the factory) which examines a filename and decides what to do. Except I'm going to use the much superior Pathname.

require "pathname"

class Pathname::Format
  def self.from_filename(filename)
    path = Pathname.new(filename)

    from_pathname!(path)

    return path
  end

  def self.from_pathname!(path)
    case path.extname
    when ".mp3"
      path.extend(MP3)
    when ".jpg"
      path.extend(JPG)
    end

    return
  end
end

Point is to put that decision into the factory class, not in the calling code.

Then you can write your modules.

module JPG
  def type
    return "JPG"
  end
end

module MP3
  def type
    return "MP3"
  end

  def year
    puts "MP3 year called"
  end
end

Now the caller just uses the factory.

# From a filename
song = Pathname::Format.from_filename("song.mp3")
puts song.ctime # Original File method
puts song.year  # Extended MP3 method

# From a Pathname
picture = Pathname.new("picture.jpg")
Pathname::Format.from_pathname!(picture)
puts picture.type

Instead of having a proliferation of special methods to check if an object is of a particular type, either check a type method, check if it's a kind of a module, or rely on duck typing.

if song.type == "MP3"
  puts song.year
end

if song.kind_of?(MP3)
  puts song.year
end

if song.respond_to?("year")
  puts song.year
end
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • As I said, in my case the file is opened before and I don't want to create a new instance. That is why I was looking into extending the instance, not the class. Basically it is because I would need to pass much more stuff than `filename`, using lots of memory. So this approach of creating an instance straight from the `filename` does not work for me. – Victor Oct 20 '16 at 20:15
  • @Victor The point is to put all that logic into the factory class. I rejiggered the factory to demonstrate that. – Schwern Oct 20 '16 at 20:21
  • @Victor Let the class housing the factory method do the opening for you. What do you mean "lots more memory"? Arbitrarily extending objects is more punishing in terms of performance than using a stock class instance of a specific subclass. – tadman Oct 20 '16 at 20:27
  • @tadman As I said the file is already opened, I need to reuse the same instance because it is using a GB of RAM and I don't want to be moving this around creating new objects. I'm not really working with MP3s pathnames, that's why I specified the (unconventional) requirements of my case. – Victor Oct 20 '16 at 20:31
  • @Victor We're talking about code that's going to be run in the future, aren't we? So you *could* restructure your app to encapsulate whatever file open logic you've already implemented, just move that code around. If you're really stuck, which would be very unusual, make the factory method wrap around the existing File object. This is better than randomly jamming junk into a File instance, having your own subclass gives you the authority to add *whatever* without concern for conflicts with File's own internals. – tadman Oct 20 '16 at 20:33
  • You make a class like MediaFile that does whatever special open behaviour is required, and selects the appropriate subclass based on file extension. I'd *also* encourage you to actually test that the file is the right format by doing a quick sniff test much like how the command-line `file` tool tests for magic bytes in the headers. Just because it's a `.mp3` file does not mean it's actually an MP3 file, it could be a PNG or an Excel file. – tadman Oct 20 '16 at 20:36
  • @Victor The factory class doesn't know anything about playing sound or displaying pictures. It just knows how to look at a file, determine what type it is, and choose the appropriate module to extend it with. THAT module knows how to play a sound (MP3) or display a picture (JPG). This allows a nice separation of responsibilities, and the caller doesn't have to know what file type it is beforehand. Look at the code in the answer (note I heavily edited it). Also I have a sneaking suspicion creating tiny File objects isn't eating a gig of RAM. Your memory problem likely lies elsewhere. – Schwern Oct 20 '16 at 20:37
  • @Schwern Yes, thank you for your edit. Tell me if I'm wrong but `Pathname::Format` depends on all format modules? – Victor Oct 20 '16 at 20:40
  • @tadman I agree, in fact I am already using a wrapper for `File` (that's why it's so big in RAM) and I do check all the binary metadata when `extend`ing to make sure the file format is correct. – Victor Oct 20 '16 at 20:44
  • @Victor I'm not sure what you mean by "depends on all format modules". If by "format modules" you mean `MP3` and `JPG` then yes, it uses those format modules when it calls `extends`. They'll have to be loaded somehow. There's various patterns to do that on demand, but again, I don't think this is where your memory issues are coming from. You're probably just keeping too many objects in memory at once. That would be an algorithmic fix, not a micro-optimization like you're doing. – Schwern Oct 20 '16 at 20:46
  • @Schwern When I require your `Pathname::Format` for my music player I need to require `MP3` and `JPEG` as well? In my approach I just need to require `MP3`. – Victor Oct 20 '16 at 20:51
  • @Victor In your approach, the caller has to already know what sort of filename you desire to have, decide how to determine it's that type, and hard code which module will handle it. This is error prone, brittle, and violates [DRY](https://en.wikipedia.org/wiki/DRY_principle). When you require `Pathname::Format` it takes care of loading all its required modules for you. It can do that by inlining them, or it can load them all at once, or it can load them on demand. Because the caller knows none of these details it can change what modules it uses and how it loads them later. – Schwern Oct 20 '16 at 20:56
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/126271/discussion-between-schwern-and-victor). – Schwern Oct 20 '16 at 20:56
  • @Schwern I don't think it is micro-optimisation in my case: I can read a 100GB magnetic tape, hold it in RAM while checking what file format it is to decide on which disk it should be written. By creating a new object I would be duplicating that in RAM. – Victor Oct 20 '16 at 20:56