28

Is there a way to read ahead one line to test if the next line contains specific tag data?

I'm dealing with a format that has a start tag but no end tag.

I would like to read a line add it to a structure then test the line below to make sure it not a new "node" and if it isn't keep adding if it is close off that struct and make a new one

the only solution i can think of is to have two stream readers going at the same time kinda suffling there way along lock step but that seems wastefull (if it will even work)

i need something like peek but peekline

Casebash
  • 114,675
  • 90
  • 247
  • 350
Crash893
  • 11,428
  • 21
  • 88
  • 123
  • I think PeekLine approach is not a good way to deal with "no end tag" problem, because you always have to peek line and test wherher new structure begins. I'd like to set the position of the stream to the previous line and next ReadLine will return the line you have read. – Gqqnbig Sep 09 '13 at 09:03

4 Answers4

31

The problem is the underlying stream may not even be seekable. If you take a look at the stream reader implementation it uses a buffer so it can implement TextReader.Peek() even if the stream is not seekable.

You could write a simple adapter that reads the next line and buffers it internally, something like this:

 public class PeekableStreamReaderAdapter
    {
        private StreamReader Underlying;
        private Queue<string> BufferedLines;

        public PeekableStreamReaderAdapter(StreamReader underlying)
        {
            Underlying = underlying;
            BufferedLines = new Queue<string>();
        }

        public string PeekLine()
        {
            string line = Underlying.ReadLine();
            if (line == null)
                return null;
            BufferedLines.Enqueue(line);
            return line;
        }


        public string ReadLine()
        {
            if (BufferedLines.Count > 0)
                return BufferedLines.Dequeue();
            return Underlying.ReadLine();
        }
    }
Nic Strong
  • 6,532
  • 4
  • 35
  • 50
  • 2
    I would initialize the `BufferedLines` before usage :) and also, I'd use another name for PeekLine(), as the name suggests that it would always return the same line (the next line from the position of the last ReadLine). Voted +1 already – tofi9 May 09 '09 at 05:42
  • 1
    Thanks added the initializer. Never even compiled the code. Maybe something like LookAheadReadLine() might be more appropriate. – Nic Strong May 09 '09 at 05:54
  • 7
    I expanded this slightly so the class inherits from TextReader: https://gist.github.com/1317325 – Andy Edinborough Oct 26 '11 at 18:39
  • 1
    @AndyEdinborough loving the PeekableTextReader – Chris Marisic Nov 30 '11 at 17:50
  • @AndyEdinborough You just saved me two hours, great work and many thanks! – tempy Dec 05 '11 at 16:14
  • @AndyEdinborough Very nice (+1 to you both). But after going to the trouble of writing nice reusable code, wouldn't you also want to implement Seek()? Also, I'm not quite understanding why you used MemoryStream as a 'buffer'. Won't that grow to be as big as the whole file, at which point it would've been simpler to just load the whole file up front? – Jon Coombs Sep 30 '13 at 15:39
4

You could store the position accessing StreamReader.BaseStream.Position, then read the line next line, do your test, then seek to the position before you read the line:

            // Peek at the next line
            long peekPos = reader.BaseStream.Position;
            string line = reader.ReadLine();

            if (line.StartsWith("<tag start>"))
            {
                // This is a new tag, so we reset the position
                reader.BaseStream.Seek(pos);    

            }
            else
            {
                // This is part of the same node.
            }

This is a lot of seeking and re-reading the same lines. Using some logic, you may be able to avoid this altogether - for instance, when you see a new tag start, close out the existing structure and start a new one - here's a basic algorithm:

        SomeStructure myStructure = null;
        while (!reader.EndOfStream)
        {
            string currentLine = reader.ReadLine();
            if (currentLine.StartsWith("<tag start>"))
            {
                // Close out existing structure.
                if (myStructure != null)
                {
                    // Close out the existing structure.
                }

                // Create a new structure and add this line.
                myStructure = new Structure();                   
                // Append to myStructure.
            }
            else
            {
                // Add to the existing structure.
                if (myStructure != null)
                {
                    // Append to existing myStructure
                }
                else
                {
                    // This means the first line was not part of a structure.
                    // Either handle this case, or throw an exception.
                }
            }
        }
seddy
  • 79
  • 3
  • 2
    Looking here: it appears that the position of the underlying stream may not always match what StreamReader:http://stackoverflow.com/questions/1737591/streamreader-c-peek – Casebash Feb 24 '10 at 04:12
1

Why the difficulty? Return the next line, regardless. Check if it is a new node, if not, add it to the struct. If it is, create a new struct.

// Not exactly C# but close enough
Collection structs = new Collection();
Struct struct;
while ((line = readline()) != null)) {
    if (IsNode(line)) {
        if (struct != null) structs.add(struct);
        struct = new Struct();
        continue;
    }
    // Whatever processing you need to do
    struct.addLine(line);
}
structs.add(struct); // Add the last one to the collection

// Use your structures here
foreach s in structs {

}
Josh Smeaton
  • 47,939
  • 24
  • 129
  • 164
0

Here is what i go so far. I went more of the split route than the streamreader line by line route.

I'm sure there are a few places that are dieing to be more elegant but for right now it seems to be working.

Please let me know what you think

struct INDI
    {
        public string ID;
        public string Name;
        public string Sex;
        public string BirthDay;
        public bool Dead;


    }
    struct FAM
    {
        public string FamID;
        public string type;
        public string IndiID;
    }
    List<INDI> Individuals = new List<INDI>();
    List<FAM> Family = new List<FAM>();
    private void button1_Click(object sender, EventArgs e)
    {
        string path = @"C:\mostrecent.ged";
        ParseGedcom(path);
    }

    private void ParseGedcom(string path)
    {
        //Open path to GED file
        StreamReader SR = new StreamReader(path);

        //Read entire block and then plit on 0 @ for individuals and familys (no other info is needed for this instance)
        string[] Holder = SR.ReadToEnd().Replace("0 @", "\u0646").Split('\u0646');

        //For each new cell in the holder array look for Individuals and familys
        foreach (string Node in Holder)
        {

            //Sub Split the string on the returns to get a true block of info
            string[] SubNode = Node.Replace("\r\n", "\r").Split('\r');
            //If a individual is found
            if (SubNode[0].Contains("INDI"))
            {
                //Create new Structure
                INDI I = new INDI();
                //Add the ID number and remove extra formating
                I.ID = SubNode[0].Replace("@", "").Replace(" INDI", "").Trim();
                //Find the name remove extra formating for last name
                I.Name = SubNode[FindIndexinArray(SubNode, "NAME")].Replace("1 NAME", "").Replace("/", "").Trim(); 
                //Find Sex and remove extra formating
                I.Sex = SubNode[FindIndexinArray(SubNode, "SEX")].Replace("1 SEX ", "").Trim();

                //Deterine if there is a brithday -1 means no
                if (FindIndexinArray(SubNode, "1 BIRT ") != -1)
                {
                    // add birthday to Struct 
                    I.BirthDay = SubNode[FindIndexinArray(SubNode, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim();
                }

                // deterimin if there is a death tag will return -1 if not found
                if (FindIndexinArray(SubNode, "1 DEAT ") != -1)
                {
                    //convert Y or N to true or false ( defaults to False so no need to change unless Y is found.
                    if (SubNode[FindIndexinArray(SubNode, "1 DEAT ")].Replace("1 DEAT ", "").Trim() == "Y")
                    {
                        //set death
                        I.Dead = true;
                    }
                }
                //add the Struct to the list for later use
                Individuals.Add(I);
            }

            // Start Family section
            else if (SubNode[0].Contains("FAM"))
            {
                //grab Fam id from node early on to keep from doing it over and over
                string FamID = SubNode[0].Replace("@ FAM", "");

                // Multiple children can exist for each family so this section had to be a bit more dynaimic

                // Look at each line of node
                foreach (string Line in SubNode)
                {
                    // If node is HUSB
                    if (Line.Contains("1 HUSB "))
                    {

                        FAM F = new FAM();
                        F.FamID = FamID;
                        F.type = "PAR";
                        F.IndiID = Line.Replace("1 HUSB ", "").Replace("@","").Trim();
                        Family.Add(F);
                    }
                        //If node for Wife
                    else if (Line.Contains("1 WIFE "))
                    {
                        FAM F = new FAM();
                        F.FamID = FamID;
                        F.type = "PAR";
                        F.IndiID = Line.Replace("1 WIFE ", "").Replace("@", "").Trim();
                        Family.Add(F);
                    }
                        //if node for multi children
                    else if (Line.Contains("1 CHIL "))
                    {
                        FAM F = new FAM();
                         F.FamID = FamID;
                        F.type = "CHIL";
                        F.IndiID = Line.Replace("1 CHIL ", "").Replace("@", "");
                        Family.Add(F);
                    }
                }
            }
        }
    }

    private int FindIndexinArray(string[] Arr, string search)
    {
        int Val = -1;
        for (int i = 0; i < Arr.Length; i++)
        {
            if (Arr[i].Contains(search))
            {
                Val = i;
            }
        }
        return Val;
    }
Crash893
  • 11,428
  • 21
  • 88
  • 123