0

I want to read textfile using streamreader like this:

using (StreamReader sr = new StreamReader(FileName, Encoding.Default))
{
   string text = sr.ReadToEnd();
   string[] lines = text.Split('**Every 8th |**');

   foreach(string line in lines)
   {
     .....
   }
}

My text file contains values divided by "|". Every 8 "|" contains new group of values which I need to work with. For example I have this:

1|2|3|4|5|6|7|8|9|10|11....

What I need is

1|2|3|4|5|6|7|8
9|10|11....

Is it somehow possible to manage this? Regular expressions perhaps? My text file contains hundreds of "|".

Jacob H
  • 41
  • 8
  • Any reason why you want to use StreamReader? – helb Mar 18 '14 at 09:19
  • Yes, there are no line breaks. Everything is in one line. Don't know why, but I can't change the structure of this file, it's been given to me like this :/ – Jacob H Mar 18 '14 at 09:19
  • @helb - No reason, just looked like good idea to me. If you have better idea how to read a file and have a group of values every 8th "|", tell me. – Jacob H Mar 18 '14 at 09:21

2 Answers2

1

Maybe you want to get a string[] for every "pseudo-line":

string text = File.ReadAllText(FileName, Encoding.Default);
string[] fields = text.Split('|');
IEnumerable<string[]> lines = fields
    .Select((f, i) => new { Field = f, Index = i })
    .GroupBy(x => x.Index / 8)
    .Select(g => g.Select(x => x.Field).ToArray());

foreach(string[] lineFields in lines)
    Console.WriteLine(string.Join(", ", lineFields));
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Very good, following the same line of thought but you are faster – Steve Mar 18 '14 at 09:37
  • This is fine as long as the file is not too large, i.e suitable for `ReadAllText`, if you are looking for performance, something along these lines should be considered http://stackoverflow.com/a/15414467/659190, See may answer for specialization to this problem. – Jodrell Mar 18 '14 at 10:26
  • Tim: great minds think alike - I came up with the same answer independently - removing mine now. – decPL Mar 18 '14 at 10:33
  • Well, thank you very much, this seems to be working. Never used construction like this before. Only thing which could be improved is time what it takes to process whole file. Around 2 minutes :) – Jacob H Mar 18 '14 at 10:53
  • @JacobH: how large is the file? If you don't need that array you could also remove the `ToArray` inside. That would be more efficient. You haven't mentioned what you want to do with the result. If performance is very important and you don't need to process the whole file you should use Jodrell's approach. – Tim Schmelter Mar 18 '14 at 10:57
  • Performance is not that important for me. This will run as a background scheduled task every 1 hour. What I will do with it is to update some columns in database. So this solution is useful for me. My file has around 2MB. – Jacob H Mar 18 '14 at 11:08
0

If the file is very large or you want to chunk in different sizes you might prefer something more generic.

IEnumerable<IEnumerable<string>> ReadFileInChunks(
        string fileName,
        char[] separators,
        int chunkSize)
{
    string[] bucket = null;
    var count = 0;

    foreach(var item in SplitFile(fileName, sperators))
    {
        if (bucket == null)
        {
            bucket = new string[chunkSize]
        }

        bucket[count++] = item;

        if (count != size)
        {
            continue;
        }

        yield return bucket;
        bucket = null;
        count = 0;
    }

    // Alternatively, throw an exception if bucket != null
    if (bucket != null)
    {           
        yield return bucket.Take(count); 
    }
}

private IEnumerable<string> SplitFile(
        string FileName,
        char[] separators)
{
    var check = new HashSet<int>(seperators.Select(c => (int)c);
    var buffer = new StringBuilder();

    using (var reader = new StreamReader(FileName))
    {
        var next = reader.Read();
        while(next != -1)
        {
            if (check.Contains(next))
            {
                yield return buffer.ToString();
                buffer.Clear();
                continue;
            }

            buffer.Append((char)next);

            next = reader.Read();
        }   
    }

    if (buffer.Length > 0)
    {
        yield return buffer.ToString();
    }
}

This will read your file one char at a time, good if the file is large, not bad if it isn't. It lazily yields the groups in a size you specify.

foreach (var row in ReadFileInChunks(FileName, new[] { '|' }, 8))
{
    foreach (var item in row)
    {
        // ...
    }
}

or, if you really wan to re-join the values,

var results = ReadFileInChunks(FileName, new[] { '|' }, 8).Select(row =>
    string.Join("|", row));
Jodrell
  • 34,946
  • 5
  • 87
  • 124