4

My program can read several dozen file formats, using the traditional approach where I write procedural code for each file format. Most of these formats have their own unique loader library, their own bugs, their own limitations, and the whole thing is a huge time sink for me. I'd like to support a ton of other formats, but they're mostly not worth my time because they're not popular enough.

I'd like to replace my existing loaders with a single loader powered by a file format descriptor. I'm certain that someone has created software to learn file formats by example. My existing loaders would make excellent fitness functions for those formats, and I can write fitness functions for new formats too.

My question is, what software can I use to "learn" file formats by example, and how can I convert that "learning" into a descriptor for use with a generic loader?

David
  • 1,023
  • 1
  • 8
  • 16

1 Answers1

3

Unless you limit it in some massive ways, I don't think you're likely to get very far. This would be ideal but beyond the current state of the art. For an arbitrary formats, you cannot do this, for example if I give you 200 JPGs,PNGs,BMPs and GIFs it very highly unlikely that a learning system can learn the formats.

Here are some problems researchers have looked at:

Community
  • 1
  • 1
carlosdc
  • 12,022
  • 4
  • 45
  • 62
  • 1
    So a machine can't learn every possible file format by example, but that shouldn't stop me. Where formats have complexities like compression, I would of course provide a decompression function. If some file format is just too complex, I can write a loader like I have. I think most file formats aren't beyond the reach of a learning algorithm, and I'd like to do what I can. – David May 10 '13 at 00:27