1

I have a quite long chain of numbers between 0 and 3. I'm wondering how I can save it so that it uses the least disk space possible. I've been thinking on it, and I have noticed that a number from 0 to 3 can be also two binary digits, so it might be a good idea to save it as a binary.

I would like also to save it on a custom filetype for make my app the single one that reads it, but it's not essential (however, it would be appreciated).

I found this questions also, that may be useful, but I haven't found the way to do it:

But I can't find any question or blog or something about how to create a custom filetype on iOS, only questions about how to associate it to an app.

If you need some more information, ask me!

Thanks in advance!!!

EDIT:

The chain will be about hundreds, I think that it will be smaller than 1000. Really, what I'm trying to achieve is to save a invented DNA data, and the DNA bases are A, G, C and T, that can be converted to 0, 1, 2, 3; but if I save that as a text file it would be much bigger. The only thing that I'll do with it is calculate the complementary base (G > C, C > G, A > T, T > A).

Thanks you again for your attention!!

Community
  • 1
  • 1
Garoal
  • 2,364
  • 2
  • 19
  • 31
  • How long? Are we talking hundreds, thousands, millions? The least disk space would be consumed by bit-packing the numbers (2 bits per number), but the algorithms would more tricky to debug. Also, what kinds of calculations do you do with those number strings? You don't need a "custom filetype" in a Windows/MacOS sense of a word. Just come up with a format, and write a parser/serializer for it. – Seva Alekseyev Jun 19 '12 at 19:27
  • 2
    I assume you are talking about integers? What's wrong with packing the 2-bit values? This can be preceded by a count value to indicate how long the array is. – Jim Jun 19 '12 at 19:30
  • I will edit my question to answer both comments, and provide some more information – Garoal Jun 19 '12 at 19:31
  • 4
    Hundreds is not a big deal at all. You could really use any approach for this without needing to worry about memory. – Christian Jun 19 '12 at 19:44
  • 1
    Storing them as 4-byte integers would be probably too much :) Go with a one byte per base format. You can even use ASCII characters AGCT. And use NSString for in-memory representation. – Seva Alekseyev Jun 19 '12 at 19:58
  • OK, I'll try all this, and if I get something good I'll answer my own question. Thanks you all!! – Garoal Jun 19 '12 at 20:23

2 Answers2

3

I see on your profile that you're 15, so I'm assuming that you're doing this for fun and to learn. If you would be doing this professionally you would probably want to go for a standard csv format, because it's the fastest and easiest thing to implement. Since you're dealing with less than a thousand values, the file will never be bigger than 2 KB, which is nothing by comparison. Your app file is likely thousands of times bigger.

Example file format:

0,1,2,3,2,1,3,0

or even

A,G,C,T,C,G,T,A

If you are doing this just for fun or to learn and not charging $100/h, I would go for the binary format with 2-bit values. It's a bit of a challenge and probably fun to do. However, the extra time would never be worth it for a paying customer. It has an extremely low effect on the apps performance and memory usage and every hour extra you'd spend on it would cost an extra $100 for the customer.

Of course, if you were dealing with billions of values, this optimization would make sense, but with hundreds of values it's not worth the extra effort.

Erik B
  • 40,889
  • 25
  • 119
  • 135
  • You're right, I'm doing this as a hobby, and of course for learning. I think the .csv format is what I need, and I can simply change the file extension for create a file that only my app will recognize. Great answer. Thanks! – Garoal Jun 20 '12 at 05:38
  • @Asterix22, Just FYI, on a UNIX system the file extension is of little importance. No matter what file extension you choose, any text editor will be able to open it. However, you do not need to worry about that, iOS apps are sandboxed and do not have access to each other's data. If you are dealing with sensitive data you should encrypt it. Changing the file extension will not protect it at all, it will only make it easier to identify the file format. So by all means use a custom file extension, but do not expect to get any added security for doing so. – Erik B Jun 21 '12 at 10:35
  • Yeah, it's not for security, I only want a file format recognizable by my app, because I want to send also that data by e-mail, so it's easier. But I have already found how to do it. – Garoal Jun 21 '12 at 13:03
0

This should give you an idea of how to pack 4 2-bit values into a single char. I'd rather read CSV data than this, but you should be aware of how to pack it.

Or, use the compression libraries you have at your fingers (ZLib and such) to output a compressed stream. Or compress this bit stream we've created here for an even smaller footprint.

 // Given sourceArray as an array of ASCII chars containing '0' through '3'
unsigned char accumulator = 0;

for(i=0; i&ltnumBytesInSourceArray; i++)
{
    int value = sourceArray[i] - '0';
    assert(value >= 0 && value &lt= 3);

    int shift = (i & 0x03) * 2;
    accumulator |= (value  - '0') &lt&lt shift;

    if(shift == 6)    
    {
        outputByte(accumulator);
        accumulator = 0;
    }
}

if(numBytesInSource & 0x03)
{
    outputByte(accumulator);
}
Michael Dorgan
  • 12,453
  • 3
  • 31
  • 61