10

The .NET BinaryReader/BinaryWriter classes can be constructed with specifying an Encoding to use for String-related operations.

I was implementing custom string formats with extension methods, but would yet implement them in a way they respect the Encoding specified when instantiating the BinaryReader/Writer.

There does not seem to be a way to retrieve the Encoding from the reader/writer, not even when inheriting from their class. I could only inherit from them to intercept the passed encoding by recreating all their constructors. I looked into the .NET source code, and it is only used to instantiate a Decoder class (in case of the BinaryReader), but I can't access that one also.

Do I lose to a shortcoming in those classes here? Can I hack into them with reflection?

Ray
  • 7,940
  • 7
  • 58
  • 90
  • OK, you're right. The WriteString method seems to explicitly ignore it though (according to MSDN). – H H Apr 03 '15 at 16:30
  • 1
    If subclassing and intercepting the Encoding in the constructors is even remotely feasible in your scenario, I'd prefer it over potentially unstable reflection hacks. (Which I'm not sure are even possible since I didn't see any Encoding reference in the Decoder class.) – nodots Apr 03 '15 at 16:41
  • Can you explain why your extension methods need to operate on the BinaryReader/BinaryWriter themselves? The ReadString/WriteString methods should translate the encoding, so if your extension methods operate on the strings coming in and out of these methods, it seems to me like you wouldn't have to worry about it? – Joshua Carmody Apr 03 '15 at 16:48
  • 3
    These classes were designed with the assumption that you cannot not know this. Since it was your code that created the reader or writer, not the framework's code. Maybe that assumption was not correct but we don't have a lot of SO users asking for a workaround. Don't lose info that you need. – Hans Passant Apr 03 '15 at 16:49
  • @JoshuaCarmody: Sometimes I have pretty exciting file formats which use an 0-termianted ASCII string here and byte-prefixed unicode strings there. I used my extension methods to always specify the encoding with them, and then read the bytes (in case of the ASCII string) and asked the given encoding to make a string out of them. – Ray Apr 03 '15 at 16:56
  • @HansPassant: That is true, but if I just write some extension methods for `BinaryReader/Writer`, they can't know the code which created the reader/writer. – Ray Apr 03 '15 at 16:57
  • 3
    @DebugErr Maybe instead of doing extension methods, it would be better to write a wrapper class for BinaryReader/BinaryWriter that encapsulates the additional logic, and use that class instead? – Joshua Carmody Apr 03 '15 at 16:59
  • @JoshuaCarmody: Yeah but then I can also directly inherit from BinaryReader/Writer, intercept the encoding by re-implementing all constructors, since their classes are not sealed. – Ray Apr 03 '15 at 17:00

2 Answers2

5

Looking at the source code for BinaryReader, I see the constructor is defined as follows:

    public BinaryReader(Stream input, Encoding encoding, bool leaveOpen) {
        if (input==null) {
            throw new ArgumentNullException("input");
        }
        if (encoding==null) {
            throw new ArgumentNullException("encoding");
        }
        if (!input.CanRead)
            throw new ArgumentException(Environment.GetResourceString("Argument_StreamNotReadable"));
        Contract.EndContractBlock();
        m_stream = input;
        m_decoder = encoding.GetDecoder();
        m_maxCharsSize = encoding.GetMaxCharCount(MaxCharBytesSize);
        int minBufferSize = encoding.GetMaxByteCount(1);  // max bytes per one char
        if (minBufferSize < 16) 
            minBufferSize = 16;
        m_buffer = new byte[minBufferSize];
        // m_charBuffer and m_charBytes will be left null.

        // For Encodings that always use 2 bytes per char (or more), 
        // special case them here to make Read() & Peek() faster.
        m_2BytesPerChar = encoding is UnicodeEncoding;
        // check if BinaryReader is based on MemoryStream, and keep this for it's life
        // we cannot use "as" operator, since derived classes are not allowed
        m_isMemoryStream = (m_stream.GetType() == typeof(MemoryStream));
        m_leaveOpen = leaveOpen;

        Contract.Assert(m_decoder!=null, "[BinaryReader.ctor]m_decoder!=null");
    }

So it looks like the encoding itself isn't actually retained anywhere. The class just stores a decoder that is derived from the encoding. m_decoder is defined as follows in the class:

    private Decoder  m_decoder;

You can't access the private variable. Doing a search for that variable in the rest of the class shows it's used in a few places internally, but never returned, so I don't think you can access it anywhere in your derived class without doing some kind of crazy reflection/disassembly thing. It would have to be defined as protected for you to access it. Sorry.

Edit:

There is almost certainly a better way to solve your problem than using reflection to access the private m_decoder variable. And even if you did, it might not get you the encoding, as you noted in the comments. However, if you still want to do it anyway, see this StackOverflow answer on how to access private members with reflection.

Community
  • 1
  • 1
Joshua Carmody
  • 13,410
  • 16
  • 64
  • 83
  • Also, I don't think it's possible to get the Encoding back from the Decoder, at least from what I read when scrolling through the source. – Ray Apr 03 '15 at 16:47
  • I don't understand why it even _gets_ an encoding. Surely a `BinaryReader` is meant for reading _bytes_? – Nyerguds Jun 13 '23 at 19:56
3

If subclassing and intercepting the Encoding in the constructors is even remotely feasible in your scenario, I'd prefer it over potentially unstable reflection hacks.

However, if you must go the reflection route for some reason, here are some pointers I found from the BinaryReader source code you referenced:

nodots
  • 1,450
  • 11
  • 19
  • 1
    I went with the constructor approach. Luckily they only have 3 public constructors. Make sure to remember `new UTF8Encoding()` for the default case where no encoding was specified. – Ray Apr 03 '15 at 17:14