2

I have the following log file:

START:SOME_STRING
BL:2
LK:3
LH:5
end
START:SOME_STRING
BL:5
LK:6
LH:6
end

Which has multiple START: -> end structures inside. Is there a better 'non-sloppy' way of parsing this file rather than reading line by line and using SPLIT?

rvk
  • 737
  • 3
  • 9
  • 23

3 Answers3

2

You can try to formalize your ini-file's grammar, and you some of parser generators. See this question for more detail.

Be aware howeveer that for such a simple grammar as yours it might be easier to parse manually :-P

class IniEntry
{
    public int BL;
    public int LK;
    public int LH;
    IniEntry Clone() { return new IniEntry { BL = BL, LK = LK, LH = LH }; }
}

IEnumerable<IniEntry> Parse()
{
    IniEntry ie = new IniEntry();
    while (ParseEntry(out ie))
        yield return ie.Clone();
}

bool ParseEntry(out IniEntry ie)
{
    ie = new IniEntry();
    return ParseStart(ie) &&
               ParseBL(ie) &&
               ParseLK(ie) &&
               ParseLH(ie) &&
               ParseEnd(ie);
}

bool ParseStart(IniEntry ie)
{
    string dummy;
    return ParseLine("START", out dummy);
}

bool ParseBL(IniEntry ie)
{
    string BL;
    return ParseLine("BL", out BL) && int.TryParse(BL, out ie.BL);
}

bool ParseLK(IniEntry ie)
{
    string LK;
    return ParseLine("LK", out LK) && int.TryParse(LK, out ie.LK);
}

bool ParseLH(IniEntry ie)
{
    string LH;
    return ParseLine("LH", out LH) && string.TryParse(LH, out ie.LH);
}

bool ParseLine(string key, out string value)
{
    string line = GetNextLine();
    var parts = line.Split(":");
    if (parts.Count != 2) return false;
    if (parts[0] != key) return false;
    value = parts[1];
}

etc.

Community
  • 1
  • 1
Vlad
  • 35,022
  • 6
  • 77
  • 199
0

This is a good candidate for a while loop and a state machine. With this approach you would use even use less memory and have greater performance than using string.split()

Karim Agha
  • 3,606
  • 4
  • 32
  • 48
0

If it is certain that the START/END are always matched, (apologies, my C# is embarrassing, so plain English):

Read the whole file with System.IO.ReadToEnd
Parse the whole thing in one go with a regular expression
Iterate over regex results

The regex would be something like "(START:([^$]+)$BL:([^$]+)$LK:([^$]+)$LH:([^$]+)$end$)+", off the top of my head, you'll need to validate/adjust according to how your parameters BL/LK etc. occur

smirkingman
  • 6,167
  • 4
  • 34
  • 47
  • 1
    This doesn't sounds efficient when some files I need read are over 6000 lines. Unless I am mistaken? – rvk Nov 03 '10 at 09:18
  • That's really a bad solution. Just imagine the file has a size of several hundred megs. Better read it line by line (or chunk by chunk) and use some kind of state machine. – Oliver Nov 03 '10 at 09:27
  • Is memory really a problem? Even if it momentarily takes 1Gb of memory, so what? Of course, a state machine is more elegant, but this solves the problem quickly – smirkingman Nov 03 '10 at 09:39