5

Please forgive me if there is a glaringly obvious answer to this question; I haven't found it because I'm not entire sure what I'm looking for. It may well be this duplicates a question I haven't found; sorry.

I have a C executable that uses text, audio, video, icons and a variety of different file types. These files are stored locally; the folder structure is large and deep and would need to be installed alongside the application for it to operate correctly (not that I anticipate it being distributed I'm looking to package my own work for convenience).

In my own opinion it would be more convenient if the file library was stored in a single file that remained accessible to the application for example alongside /usr/bin/APPLICATION or in the most appropriate location; accessed by the executable when required.

I searched for questions similar and found suggestions that indicated two possible options Resource Files which appear to be native to Windows and Including files at compile. The first question leads to an answer similar to the second and doesn't answer the question relating to the existence of resource files for linux executables. It (like the second) looks at including the datafile in the compilation process. This is not so useful as if I only want to update my resources I'm forced to recompile the entire application (the media is dynamically added).

QUESTION: Is there a way to store a variety of file types in one single file accessible to an executable in linux, and if so how would you implement this?

My thoughts on this initially were to create a .zip or .gz file which might also offer compression as an added bonus but I have no idea how (or if it is even possible) to access data within such a file on the fly. I'm equally uncertain if there is a specific file type or library that offers a more suitable solution. Also I know virtually nothing about .dat files could these be used in this context on a linux system?

Community
  • 1
  • 1
Chortle
  • 171
  • 2
  • 11
  • 2
    `.dat` files can mean anything; I don't know of a standard `.dat` file format. `.zip` sounds like a good choice (java uses it for jars); look into `libzip`. – Colonel Thirty Two Aug 21 '15 at 23:13
  • Apparently Zip can easily extract just the file you want, but not tar/gz. See http://gamedev.stackexchange.com/questions/37648/how-can-you-put-all-images-from-a-game-to-1-file – David Zech Aug 21 '15 at 23:13
  • You could store all your filenames in a single textfile, (with sections for each file type, or perhaps one text file for each file type) which is/are parsed by the program. Then you would not need to keep recompiling the source, where you presumably hard code your resource file names. "Resource files" is a term usually associated with the compilation, such as defining icons, cursors and dialogs, rather than data to process. – Weather Vane Aug 21 '15 at 23:38
  • 1
    Many programs use the "zip" format, with a different file extension to pretend to the world its not a zip. Office 2007 and newer use .docx, .xlsx etc. which are in truth all .zip files. Java uses .jar that is in truth a .zip file aswell, and even some games like Quake 3 (that is open-source by the way) use a .pak that again, is another .zip file in disguise. I can only presume theres a library out there that makes using zips to store static content very easy to use. – Havenard Aug 22 '15 at 00:10
  • On Mac OS X a `.app` program is actually a directory-and-file structure laid out in a specified way and flagged to be treated like a program... not even .zipped into a single file. – Stephen P Aug 22 '15 at 01:08
  • @StephenP But the file system knows that such directories and enclosed files are unlikely to be modified, so then it optimizes them with sub-block allocation and compression. Down the rabbit hole… – Potatoswatter Aug 22 '15 at 02:32

4 Answers4

3

I do not understand why you would use a single file at all. Considering the added complexity (and increased chance of bugs creeping in) of file extraction and the associated overheads, I do not see how it would be "more convenient".

I have a C executable that uses text, audio, video, icons and a variety of different file types.

So do many other Linux applications. The normal approach, when using package management, is to put the architecture independent data (icons, audio, video, and so on) for application /usr/bin/YOURAPP in /usr/share/YOURAPP/, and architecture dependent data (like helper binaries) in /usr/lib/YOURAPP. It is extremely common for the latter two to be full directory trees, sometimes quite deep and wide.

For locally compiled stuff, it is common to put these in /usr/local/bin/YOURAPP, /usr/local/share/YOURAPP/, and /usr/local/share/YOURAPP/ instead, just to avoid confusing the package manager. (If you check ./configure scripts or read Makefiles, this is the chief purpose of the PREFIX variable they support.)

It is also common for the /usr/bin/YOURAPP to be a simple shell script, setting environment variables, or checking for user-specific overrides (from $HOME/.YOURAPP/), ending up with exec /usr/lib/YOURAPP/YOURAPP.bin [parameters...], which replaces the shell with the actual binary executable without leaving the shell in memory.

As an example, /usr/share/octave/ on my machine contains a total of 138 directories (in a hierarchy of up to 7 directories deep) and 1463 files; about ten megabytes of "stuff" all told. LibreOffice, Eagle, Fritzing, and KiCAD take hundreds of megabytes there each, so Octave is not an extreme example in any way either.

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • I agree, and see the [Filesystem](https://wiki.linuxfoundation.org/en/FHS) [Hierarchy](http://www.pathname.com/fhs/) [Standard](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard) for details on where to put what. – Stephen P Aug 22 '15 at 01:01
  • I think this answers the question from the perspective of should I create a single file of resources for an application. I don't actually feel so bad having a sizeable collection of resources. – Chortle Aug 22 '15 at 06:42
  • I do think there may be contexts whereby an individual may not want their resources available such as a licensed piece of software or a game. Resources might be time consuming to generate and the individual may not want them distributed separately. However I agree with you in this context, so thank you. – Chortle Aug 22 '15 at 06:45
  • @Chortle: Encrypting resource files, or obfuscating content inside an archival file, is typically not the deterrent you seem to think it is. If someone *wants* the contents, they *will* get at it. Just look at games, or any proprietary software, really. Usually, a much more sensible balance is struck by using custom file formats for the resources. Just enough to stop "accidental" copying, but not too complicated to slow down I/O or require much overhead (memory, CPU time used) to implement. – Nominal Animal Aug 22 '15 at 06:51
  • Valid point I guess it's about taking reasonable steps to protect your copyright resources in that instance; custom file formats do seem like a good option although like with any file made public you have to shrug and accept that it probably isn't enough. I'm quite intrigued about how a custom file or indeed any file is put together; particularly after reading KemyLand's response. It's a little beyond me but it's interesting. – Chortle Aug 22 '15 at 07:12
1

You have several alternatives (TODO: add more ;)):

You can read some archiver file format specifications, writting code to read/write to those archivers, and waste your time doing so.

You can invent a dirty, simple file format, for example ("dsa" stands for "Dirty and Simple Archiver"):

#include <stdint.h>

// Located at the beginning of the file    
struct DSAHeader {
    char            magic[3];            // Shall be (char[]) { 'D', 'S', 'A' }
    unsigned char   endianness;          // The rest of the file is translated according to this field. 0 means little-endian, 1 means big-endian.
    unsigned char   checksum[16];         // MD5 sum of the whole file. (when calculating checksums, this field is psuedo-filled with zeros).
    uint32_t        fileCount;
    uint32_t        stringTableOffset;   // A table containing the files' names.
};

// A dsaHeader.fileCount-sized array of DSAInodeHeader follows the DSAHeader.
struct DSANodeHeader {
    unsigned char   type;              // 0 means directory, 1 means regular file.
    uint32_t        parentOffset;      // Pointer to the parent directory, or zero if the node is in the root.
    uint32_t        offset;            // The node's type-dependent header starts here.
    uint32_t        nodeSize;          // In bytes for files, and in number of entries for directories.
    uint32_t        dataOffset;        // The file's data starts at this offset for files, and a pointer to the first DSADirectoryEntryHeader for directories.
    uint32_t        filenameOffset;    // Relative to the string table.
};

typedef uint32_t    DSADirectoryEntryHeader;    // Offset to the entry's DSANodeHeader

The "string table" is a contiguous sequence of null-terminated character strings.

This format is greatly simple (and portable ;)). And, as a bonus, if you want (de)compression, you can use something like Zip, BZ2, or XZ to (de)compress your file (those programs/formats are archiver-agnostic, i.e, not dependent on tar, as commonly believed).

As last last (or first?) resort, you may use an existent library/API for manipulating archivers and compressed file formats.

Edit: Added support for directories :).

3442
  • 8,248
  • 2
  • 19
  • 41
  • I really like the idea of a file format although this is probably a little outside my skillset, would you need to use a function to serialise the data to an `unit32_t`? then deseralise or am I completely off here? – Chortle Aug 22 '15 at 06:59
  • You're right. If you want to, you can get rid of the `endianness` field and do simple raw reads and writes. But, if you want to go really portable, you shall leave that field there, and convert between little endian and big endian as necessary. For dirt and quick purposes, you may also remove the checksum for now if you want to. – 3442 Aug 22 '15 at 07:02
  • This is going to sound really dumb but what is endian? I'd be quite interested to read up on custom file formats do you know of any good resources? I could google but there's such a mix of quality it's always worth asking first. – Chortle Aug 22 '15 at 07:18
  • Well, I'm probably not the best one to explain to you what is endianness. Google it or something :). For now, you just shall now that if different machines have different endianness, the data in a struct's field will get messed up. And to give you some calm, endianness is the same accross all PCs (Intel architecture), with it being what is known as "little-endian". – 3442 Aug 22 '15 at 07:57
  • 1
    With respect to file formats, you can get locked some universe-lifetimes reading [this very list of various file formats, with links to documentation et al](http://www.online-convert.com/file-type). However, "custom" file formats are custom by nature, so they're not standarized/don't have a specification :). However, if for example I would write documentation for the "DSA" format above, it would inmediately become standarized, but I haven't, so it isn't. – 3442 Aug 22 '15 at 08:01
1

I have a C executable that uses text, audio, video, icons and a variety of different file types. These files are stored locally; the folder structure is large and deep and would need to be installed alongside the application for it to operate correctly.

Considering the added complexity of associated differrent file types alongwith folder structure large and deep and required installed with application. Adding a single resources file would be difficult or would say near to immpossible to trace changes in case if you want to change resources dynamically. Certainly, adding resources to executable file is not an option as it will be increase the size of executable file and needed frequent re-complation in case of update of resources.

After giving consideration on all aspects of your project it seems to me the solution would be using INI file. INI would be stored at definate location and other resources location should be prived in INI File. As with INI you can store the locations of resources, hash keys and sizes easily and would easy check the changes or update the resources.

Since you are using already compressed versions of File type and thus General Zipping algos would not work as the rate would be very low. Thus recommend to use 7z algos for compression. From various algo I would suggest to opt of xz zipping algo as it is currently used by many opensource project to compress the binaries and decrease the size.

Foreach file compression its crc32 or hash value should also included in INI file to check the validity of data transfered.

Vineet1982
  • 7,730
  • 4
  • 32
  • 67
0

Lets say you have:

top-level-folder/
  |
   - your-linux-executable
   - icon-files-folder/
   - image-files-folder/
   - other-folders/
   - other-files

Do this (inside top-level-folder)

tar zcvf my-package.tgz top-level-folder

To expand, do this:

tar zxvf my-package.tgz
FractalSpace
  • 5,577
  • 3
  • 42
  • 47