0

So I have a device which has an inbuilt logger program which generates status messages about the device and keeps pushing them to a .txt file. These messages include information about the device status, network status amongst many other things. The data in the file looks something like the following:

 <XML><DSTATUS>1,4,7,,5</DSTATUS><EVENT> hello,there,my,name,is,jack,</EVENT>
     last,name,missing,above <ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG> </XML>

 <XML><DSTATUS>1,5,7,,3</DSTATUS><EVENT>hello,there,my,name,is,mary,jane</EVENT>
     last,name,not,missing,above<ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG></XML>

    ... goes on

Note that it is not well formed XML. Also, one element can have multiple parameters and can also have blanks... for example: <NETWORKSTAT>1,456,3,6,,7</NETWORKSTAT> What my objective is is to write something in C# WPF, that would take this text file, process the data in it and create a .csv file with each event per line. For example, for the above given brief example, the first line in the csv file would be:

1,4,7,,5,hello,there,my,name,is,jack,,last,name,missing,above,3,6,7,,8,4

Also, I do not need help using basic C#. I know how to read a file, etc.. but I have no clue as to how I would approach this problem in regards to the parsing and processing and converting. I'm fairly new to C# so I'm not sure which direction to go. Any help will be appreciated!

sparta93
  • 3,684
  • 5
  • 32
  • 63
  • do some research on stack based parsers http://stackoverflow.com/questions/13466600/how-would-i-go-about-implementing-a-simple-stack-based-programming-language – JJS Jun 03 '15 at 03:42

3 Answers3

2

Since each top-level XML node in your file is well-formed, you can use an XmlReader with XmlReaderSettings.ConformanceLevel = ConformanceLevel.Fragment to iterate through each top-level node in the file and read it with Linq-to-XML:

    public static IEnumerable<string> XmlFragmentsToCSV(string path)
    {
        using (var textReader = new StreamReader(path, Encoding.UTF8))
            foreach (var line in XmlFragmentsToCSV(textReader))
                yield return line;
    }

    public static IEnumerable<string> XmlFragmentsToCSV(TextReader textReader)
    {
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ConformanceLevel = ConformanceLevel.Fragment;

        using (XmlReader reader = XmlReader.Create(textReader, settings))
        {
            while (reader.Read())
            {   // Skip whitespace
                if (reader.NodeType == XmlNodeType.Element) 
                {
                    using (var subReader = reader.ReadSubtree())
                    {
                        var element = XElement.Load(subReader);
                        yield return string.Join(",", element.DescendantNodes().OfType<XText>().Select(n => n.Value.Trim()).Where(t => !string.IsNullOrEmpty(t)).ToArray());
                    }
                }
            }
        }
    }

To precisely match the output you wanted I had to trim whitespaces at the beginning and end of each text node value.

Also, the Where(t => !string.IsNullOrEmpty(t)) clause is to skip the whitespace node corresponding to the space here: </ANOTHERTAG> </XML>. If that space doesn't exist in the real file, you can omit that clause.

dbc
  • 104,963
  • 20
  • 228
  • 340
1

Here is my solution that uses XML Linq. I create a XDocument by wrapping the fragments with a Root tag.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.csv";
        static void Main(string[] args)
        {
            string input =
                "<XML><DSTATUS>1,4,7,,5</DSTATUS><EVENT> hello,there,my,name,is,jack,</EVENT>" +
                   "last,name,missing,above <ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG> </XML>" +

                "<XML><DSTATUS>1,5,7,,3</DSTATUS><EVENT>hello,there,my,name,is,mary,jane</EVENT>" +
                   "last,name,not,missing,above<ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG></XML>";

            input = "<Root>" + input + "</Root>";

            XDocument doc = XDocument.Parse(input);

            StreamWriter writer = new StreamWriter(FILENAME);

            List<XElement> rows = doc.Descendants("XML").ToList();

            foreach (XElement row in rows)
            {
                string[] elements = row.Elements().Select(x => x.Value).ToArray();
                writer.WriteLine(string.Join(",", elements));
            }

            writer.Flush();
            writer.Close();

        }
    }
}
​
jdweng
  • 33,250
  • 2
  • 15
  • 20
  • jdweng, thank you. Your approach works flawlessly. But there's one issue that I didn't talk about in my post explicitly... If you look at my sample log data, "last,name,missing,above" the following data is not embedded within any tags, so currently your approach just ignores those 4 words however I need to include them as well in my csv data. – sparta93 Jun 04 '15 at 13:10
  • Will really appreciate your help with this – sparta93 Jun 04 '15 at 17:45
  • Solution above used XML Linq which doesn't support Text not inside tags. Posted below new solution. – jdweng Jun 05 '15 at 06:20
1

Due to non standard format had to switch from an XML Linq solution to a standard XML solution. Linq doesn't support TEXT strings that are not in tags.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.csv";
        static void Main(string[] args)
        {
            string input =
                "<XML><DSTATUS>1,4,7,,5</DSTATUS><EVENT> hello,there,my,name,is,jack,</EVENT>" +
                   "last,name,missing,above <ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG> </XML>" +

                "<XML><DSTATUS>1,5,7,,3</DSTATUS><EVENT>hello,there,my,name,is,mary,jane</EVENT>" +
                   "last,name,not,missing,above<ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG></XML>";

            input = "<Root>" + input + "</Root>";

            XmlDocument  doc = new XmlDocument();
            doc.LoadXml(input);

            StreamWriter writer = new StreamWriter(FILENAME);

            XmlNodeList rows = doc.GetElementsByTagName("XML");

            foreach (XmlNode row in rows)
            {
                List<string> children = new List<string>();
                foreach (XmlNode child in row.ChildNodes)
                {
                    children.Add(child.InnerText.Trim());
                }

                writer.WriteLine(string.Join(",", children.ToArray()));
            }

            writer.Flush();
            writer.Close();

        }
    }
}
​
jdweng
  • 33,250
  • 2
  • 15
  • 20