3

Questions about converting a string to a stream are abundant, for example:

And there are plenty of others.

However, I am yet to see an implementation that does not duplicate the memory occupied by the original string. The easiest suggest is convert the string to bytes and initialize a MemoryStream from it.

Another suggestion is to write it into the StreamWriter wrapping a MemoryStream

All of them are not memory efficient.

The reason I am bringing it is that I have to deal with a legacy system which out of sheer stupidity produces a single huge string. Now I need to apply certain post processing to this string and write it to a file and I just do not want to duplicate the damn thing. So, I am looking for a memory efficient way to do it.

Community
  • 1
  • 1
mark
  • 59,016
  • 79
  • 296
  • 580
  • What operations do you need the stream to support? Is it just `Read` or do you need to `Seek` also? I presume not `Write` (and so surprised at `StreamWriter` suggested in your question) since strings are immutable. – Damien_The_Unbeliever Dec 08 '15 at 15:51
  • There is a [`StringReader`](https://msdn.microsoft.com/en-us/library/system.io.stringreader(v=vs.110).aspx) that, whilst it's not a `Stream`, does support "stream-like" operations. – Damien_The_Unbeliever Dec 08 '15 at 15:54
  • Only read. I prefer `Stream`, since it is more suitable for binary transformations. – mark Dec 08 '15 at 15:58
  • Why not just repeatedly extract reasonable-sized portions of the string using String.SubString, and post-process them. – Joe Dec 08 '15 at 16:23
  • I could always do it, but there is also always hope that something like this already exists. I do not like inventing wheels. – mark Dec 08 '15 at 16:36
  • To the downvoter - care to rationalize? – mark Dec 08 '15 at 16:36
  • 1
    How about encoding? I mean, string can be converted to bytes (thus stream) in many different ways. – Ivan Stoev Dec 08 '15 at 18:23
  • The encoding should be given from the outside as an argument. My problem is that converting implies creating a separate byte buffer and copying the string data over there, thus duplicating the memory. – mark Dec 09 '15 at 15:38

1 Answers1

1

It's not hard to write a custom Stream derived class, but the challenging part in this particular case is the need of Encoding support. Here is a read only forward only implementation which uses a small buffer for fitting one full character bytes when needed:

public static class StringUtils
{
    public static Stream AsStream(this string source, Encoding encoding = null)
    {
        return string.IsNullOrEmpty(source) ? Stream.Null : new StringStream(source, encoding ?? Encoding.UTF8);
    }

    class StringStream : Stream
    {
        string source;
        Encoding encoding;
        int position, length;
        int charPosition;
        int maxBytesPerChar;
        byte[] encodeBuffer;
        int encodeOffset, encodeCount;

        internal StringStream(string source, Encoding encoding)
        {
            this.source = source;
            this.encoding = encoding;
            length = encoding.GetByteCount(source);
            maxBytesPerChar = encoding.GetMaxByteCount(1);
        }

        public override bool CanRead { get { return true; } }
        public override bool CanSeek { get { return false; } }
        public override bool CanWrite { get { return false; } }
        public override long Length { get { return length; } }
        public override void SetLength(long value) { throw new NotSupportedException(); }
        public override long Position { get { return position; } set { throw new NotSupportedException(); } }
        public override long Seek(long offset, SeekOrigin origin) { throw new NotSupportedException(); }
        public override void Write(byte[] buffer, int offset, int count) { throw new NotSupportedException(); }
        public override void Flush() { }
        public override int Read(byte[] buffer, int offset, int count)
        {
            int readCount = 0;
            for (int byteCount; readCount < count && position < length; position += byteCount, readCount += byteCount)
            {
                if (encodeCount == 0)
                {
                    int charCount = Math.Min((count - readCount) / maxBytesPerChar, source.Length - charPosition);
                    if (charCount > 0)
                    {
                        byteCount = encoding.GetBytes(source, charPosition, charCount, buffer, offset + readCount);
                        Debug.Assert(byteCount > 0 && byteCount <= (count - readCount));
                        charPosition += charCount;
                        continue;
                    }
                    if (encodeBuffer == null) encodeBuffer = new byte[maxBytesPerChar];
                    encodeCount = encoding.GetBytes(source, charPosition, 1, encodeBuffer, encodeOffset = 0);
                    Debug.Assert(encodeCount > 0);
                }
                byteCount = Math.Min(encodeCount, count - readCount);
                for (int i = 0; i < byteCount; i++)
                    buffer[offset + readCount + i] = encodeBuffer[encodeOffset + i];
                encodeOffset += byteCount;
                if ((encodeCount -= byteCount) == 0) charPosition++;
            }
            return readCount;
        }
    }
}
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343