6

I am trying to read in with C# a file written with CArchive. From what I can tell the format is:

[length of next set of data][data]...etc

I'm still fuzzy on some of the data, though. How do I read in Date data? What about floats, ints, doubles, etc?

Also, [length of next set of data] could be a byte or word or dword. How do I know when it will be each? For instance, for a string "1.10" the data is:

04 31 2e 31 30

The 04 is the length, obviously and the rest are hex values for 1.10. Trivial. Later I have a string that is 41 characters long, but the [length] value is:

00 00 00 29

Why 4 bytes for the length? (0x29 = 41)

The main question is: Is there a spec for the format of CArchive output?

Mike Webb
  • 8,855
  • 18
  • 78
  • 111
  • 3
    I don't know if it was formally specced anywhere - the assumption was that you'd use CArchive to read it back as well. The code itself is the documentation. – Mark Ransom Jan 19 '12 at 19:44
  • [similar question](http://stackoverflow.com/q/55369/1154743) – yrk Jan 19 '12 at 19:47

2 Answers2

8

To answer your question about strings, the length value that is stored in the archive is itself variable-length depending on the length and encoding of its string. If the string is < 255 characters, one byte is used for the length. If the string is 255 - 65534 characters, 3 bytes are used - a 1-byte 0xFF marker followed by a 2-byte word. If the string is 65535+ characters, 7 bytes are used - a 3-byte 0xFF 0xFF 0xFF marker followed by a 4-byte dword. To make it even more complicated, if the string is Unicode encoded, the length value is preceeded by a 3-byte 0xFF 0xFFFE marker. So in any, combination, you will never see a 4-byte length by itself, so what you showed has to be 3 0x00 bytes belonging to something else, followed by a 1-byte string length 0x29.

So, the correct way to read a string is as follows:

Assume: string data is Ansi unless told otherwise.

  1. Read a byte. If its value is < 255, string length is the value, goto 3.

  2. Read a word. If its value is 0xFFFE, string data is Unicode, goto 1. Otherwise, if its value is < 65535, string length is its value, goto 3. Otherwise, read a dword, string length is its value, goto 3.

  3. read string length number of 8bit or 16bit values, depending on whether string is Ansi or Unicode, and then convert to desired encoding as needed.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • "1. Read a byte. If its value is < 255, string length is the value, goto 3." How can you read a byte that its value is over 255? – Steven Shih Sep 11 '12 at 03:10
  • Read what I wrote more carefully. The comparison in step #1 does not include 255 itself. A string length of 255 would fall into step #2 instead, by starting with a `0xFF` marker followed by a 2-byte `WORD` value of 255. – Remy Lebeau Sep 11 '12 at 06:17
3

According to the documentation:

The main CArchive implementation can be found in ARCCORE.CPP.

If you don't have the MFC source, see this.

wallyk
  • 56,922
  • 16
  • 83
  • 148