21

What is the best way to have the functionality of the StreamReader.ReadLine() method, but with custom (String) delimiters?

I'd like to do something like:

String text;
while((text = myStreamReader.ReadUntil("my_delim")) != null)
{
   Console.WriteLine(text);
}

I attempted to make my own using Peek() and StringBuilder, but it's too inefficient. I'm looking for suggestions or possibly an open-source solution.

Thanks.

Edit

I should have clarified this earlier...I have seen this answer, however, I'd prefer not to read the entire file into memory.

Community
  • 1
  • 1
Eric
  • 2,098
  • 4
  • 30
  • 44
  • Why not using ReadLine() and then search for delimiter in string? – Denis Palnitsky Mar 26 '12 at 13:43
  • By using `Peek()` and `StringBuilder` you're basically replicating what `ReadLine()` do inside `StreamReader`... so it seems strange to me that is so slow; can you post what you have tried ? – digEmAll Mar 26 '12 at 13:46
  • Inefficient? How inefficient? Is the performance lacking noticeably? –  Mar 26 '12 at 13:48
  • Duplicate: http://stackoverflow.com/questions/6655246/how-to-read-text-file-by-particular-line-separator-character – KingCronus Mar 26 '12 at 13:48
  • 4
    @AdamKing - not a duplicate - the OP specifically wants a string delimiter, not a char delimiter – Rob Levine Mar 26 '12 at 14:17

4 Answers4

4

I figured I would post my own solution. It seems to work pretty well and the code is relatively simple. Feel free to comment.

public static String ReadUntil(this StreamReader sr, String delim)
{
    StringBuilder sb = new StringBuilder();
    bool found = false;

    while (!found && !sr.EndOfStream)
    {
       for (int i = 0; i < delim.Length; i++)
       {
           Char c = (char)sr.Read();
           sb.Append(c);

           if (c != delim[i])
               break;

           if (i == delim.Length - 1)
           {
               sb.Remove(sb.Length - delim.Length, delim.Length);
               found = true;
           }
        }
     }

     return sb.ToString();
}
Eric
  • 2,098
  • 4
  • 30
  • 44
  • 2
    It would be slightly clearer (to me) if you put a "break" right after "found = true" as well. Requires a little bit less mental processing. – Jon Coombs Apr 15 '14 at 18:48
  • 4
    This solution only works in some cases. For example, if the delimiter is "xy", then this algorithm will miss the delimiter in "axxyb" and it will read until the end of the stream. – Jirka Hanika Jul 08 '14 at 12:45
1
    public static String ReadUntil(this StreamReader streamReader, String delimiter)
    {
        StringBuilder stringBuilder = new StringBuilder();

        while (!streamReader.EndOfStream)
        {
            stringBuilder.Append(value: (Char) streamReader.Read());

            if (stringBuilder.ToString().EndsWith(value: delimiter))
            {
                stringBuilder.Remove(stringBuilder.Length - delimiter.Length, delimiter.Length);
                break;
            }
        }

        return stringBuilder.ToString();
    }
1

This code should work for any string separator.

public static IEnumerable<string> ReadChunks(this TextReader reader, string chunkSep)
{
    var sb = new StringBuilder();

    var sepbuffer = new Queue<char>(chunkSep.Length);
    var sepArray = chunkSep.ToCharArray();

    while (reader.Peek() >= 0)
    {
        var nextChar = (char)reader.Read();
        if (nextChar == chunkSep[sepbuffer.Count])
        {
            sepbuffer.Enqueue(nextChar);
            if (sepbuffer.Count == chunkSep.Length)
            {
                yield return sb.ToString();
                sb.Length = 0;
                sepbuffer.Clear();
            }
        }
        else
        {
            sepbuffer.Enqueue(nextChar);
            while (sepbuffer.Count > 0)
            {
                sb.Append(sepbuffer.Dequeue());
                if (sepbuffer.SequenceEqual(chunkSep.Take(sepbuffer.Count)))
                    break;
            }
        }
    }
    yield return sb.ToString() + new string(sepbuffer.ToArray());
}

Disclaimer:

I made a little testing on this and is actually slower than ReadLine method, but I suspect it is due to the enqueue/dequeue/sequenceEqual calls that in the ReadLine method can be avoided (because the separator is always \r\n).

Again, I made few tests and it should work, but don't take it as perfect, and feel free to correct it. ;)

digEmAll
  • 56,430
  • 9
  • 115
  • 140
1

Here is a simple parser I used where needed (usually if streaming is not a paramount just read and .Split does the job), not too optimized but should work fine:
(it's more of a Split like method - and more notes below)

    public static IEnumerable<string> Split(this Stream stream, string delimiter, StringSplitOptions options)
    {
        var buffer = new char[_bufffer_len];
        StringBuilder output = new StringBuilder();
        int read;
        using (var reader = new StreamReader(stream))
        {
            do
            {
                read = reader.ReadBlock(buffer, 0, buffer.Length);
                output.Append(buffer, 0, read);

                var text = output.ToString();
                int id = 0, total = 0;
                while ((id = text.IndexOf(delimiter, id)) >= 0)
                {
                    var line = text.Substring(total, id - total);
                    id += delimiter.Length;
                    if (options != StringSplitOptions.RemoveEmptyEntries || line != string.Empty)
                        yield return line;
                    total = id;
                }
                output.Remove(0, total);
            }
            while (read == buffer.Length);
        }

        if (options != StringSplitOptions.RemoveEmptyEntries || output.Length > 0)
            yield return output.ToString();
    }

...and you can simply switch to char delimiters if needed just replace the

while ((id = text.IndexOf(delimiter, id)) >= 0)

...with

while ((id = text.IndexOfAny(delimiters, id)) >= 0)

(and id++ instead of id+= and a signature this Stream stream, StringSplitOptions options, params char[] delimiters)

...also removes empty etc.
hope it helps

NSGaga-mostly-inactive
  • 14,052
  • 3
  • 41
  • 51