3

I am following this specification of this file format: https://github.com/rouault/dump_gdbtable/wiki/FGDB-Spec

utf16: string in little-endian UTF-16 encoding

How do I read this? I tried BinaryReader.ReadString() however it returns something along the lines of:

"\0e\0y\0w\0o\0r\0d\0\0 \0\0\0\0\rP\0a\0r\0a\0m\0e\0t\0e\0r\0N\0a\0m\0e\0\0 \0\0\0\0\fC\0o\0n\0f\0i\0g\0S\0t\0r\0"

That definitely isn't right.


From the specification:

ubyte: number of UTF-16 characters (not bytes) of the name of the field
utf16: name of the field
ubyte: number of UTF-16 characters (not bytes) of the alias of the field. Might be 0
utf16: alias of the field (ommitted if previous field is 0)
ubyte: field type ( 0 = int16, 1 = int32, 2 = float32, 3 = float64, 4 = string, 5 = datetime, 6 = objectid, 7 = geometry, 8 = binary, 9=raster, 10/11 = UUID, 12 = XML )

Could I somehow use the number of UTF-16 characters to read the name of the field?

Evan Parsons
  • 1,139
  • 20
  • 31
  • How do you construct the `BinaryReader`? Are you using an overload where you specify the encoding of the text? – Damien_The_Unbeliever Aug 01 '14 at 14:20
  • Normally you specify encoding, but on [this](http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx) page there are no little `endian utf-16`, perhaps you have to make own encoding somehow (or one of them **is** what you need, not sure). – Sinatr Aug 01 '14 at 14:23
  • BinaryReader br = new BinaryReader(File.Open("C:\\florida.gdb\\a00000002.gdbtable", FileMode.Open, FileAccess.Read, FileShare.Read | FileShare.Delete)); – Evan Parsons Aug 01 '14 at 14:25
  • @Sinatr - there is such an encoding. It helps to know that in the Windows world, `Unicode` means UTF-16. – Damien_The_Unbeliever Aug 01 '14 at 14:28
  • Do you have an example file somewhere? – Lasse V. Karlsen Aug 01 '14 at 15:00

2 Answers2

2

BinaryReaders ReadString() method doesn't provide an overload where you can specify the string length (instead it assumes an encoded prefixed length, which doesn't match the format of the spec you linked).

Therefore, you cannot use ReadString() directly, but you can

  1. use ReadByte() to get the string (character) length,
  2. multiply it by 2,
  3. use ReadBytes(count),
  4. use Encoding.Unicode.GetString(bytes).
ulrichb
  • 19,610
  • 8
  • 73
  • 87
  • Is multiplying by two necessary? When I do it, it returns something similar to the below answer, except more chinese/japanese characters after it: code sample bit = int count = (br.ReadByte() * 2) ; byte[] array = br.ReadBytes(count); field.nameOfField = Encoding.Unicode.GetString(array); – Evan Parsons Aug 01 '14 at 16:06
  • Spec says number of charachters, not bytes. Since Encoding.Unicode is 16 bits (2bytes per char) you want to multiply with 2. You might want to provide code in your question how you try to read the string. – CSharpie Aug 01 '14 at 16:09
  • aha! I think that's it! It returns "Keyword" which I believe is the name of the field. – Evan Parsons Aug 01 '14 at 16:17
1

It should be:

BinaryReader br = new BinaryReader(File.Open("C:\\florida.gdb\\a00000002.gdbtable",
                                   FileMode.Open,
                                   FileAccess.Read,
                                   FileShare.Read | FileShare.Delete),
                      Encoding.Unicode);

Where Encoding is System.Text.Encoding.


For various historical reasons, Microsoft/Windows refer to UTF-16 (and, specifically, the little-endian variant) as "Unicode" rather than UTF-16.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • It returns "攀礀眀漀爀搀\0 \0ЀഀParameterNameЀ \0䌌漀渀昀椀最匀琀爀" when I switch it to your coding. Would I have to strip out the other characters? I'd do that, but I'm afraid of losing them when I go to save it again. – Evan Parsons Aug 01 '14 at 14:39
  • If you get that in return something is almost certainly wrong. – Lasse V. Karlsen Aug 01 '14 at 14:59
  • The Fileformat doesnt work like this! You have to read the bytes at the specific Offset and then interpret them as unicode. – CSharpie Aug 01 '14 at 16:05