0

For a project in Uni I need to parse a XML result file which something like this:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ocrsdk.com/schema/resultDescription-1.0.xsd http://ocrsdk.com/schema/resultDescription-1.0.xsd" xmlns="http://ocrsdk.com/schema/resultDescription-1.0.xsd">
  <page index="0">
    <text id="print" left="160" top="349" right="339" bottom="384">
      <value>Vertraqsnummer:</value>
      <line left="167" top="366" right="326" bottom="384">
        <char left="167" top="366" right="180" bottom="382">V</char>
        <char left="182" top="370" right="192" bottom="382">e</char>
        <char left="194" top="370" right="199" bottom="382">r</char>
        <char left="199" top="367" right="205" bottom="382">t</char>
        <char left="206" top="370" right="212" bottom="382">r</char>
        <char left="213" top="370" right="223" bottom="382">a</char>
        <char left="224" top="370" right="234" bottom="384">q</char>
        <char left="236" top="371" right="245" bottom="383">s</char>
        <char left="247" top="371" right="256" bottom="382">n</char>
        <char left="258" top="371" right="268" bottom="383">u</char>
        <char left="270" top="370" right="285" bottom="383">m</char>
        <char left="287" top="370" right="302" bottom="382">
          <charRecVariants>
            <variant charConfidence="22">m</variant>
            <variant charConfidence="-1">rn</variant>
          </charRecVariants>m</char>
        <char left="304" top="370" right="314" bottom="382">e</char>
        <char left="316" top="370" right="322" bottom="382">r</char>
        <char left="324" top="370" right="326" bottom="382" suspicious="true">:</char>
      </line>
    </text>
    <text id="handprint" left="387" top="1035" right="635" bottom="1089">
      <value>309.05</value>
      <line left="398" top="1045" right="633" bottom="1089">
        <char left="398" top="1052" right="426" bottom="1088">3</char>
        <char left="423" top="1061" right="455" bottom="1089" suspicious="true">0</char>
        <char left="482" top="1055" right="505" bottom="1089">9</char>
        <char left="507" top="1084" right="512" bottom="1087">.</char>
        <char left="520" top="1058" right="549" bottom="1089">0</char>
        <char left="546" top="1045" right="633" bottom="1089" suspicious="true">5</char>
      </line>
    </text>
    <checkmark id="checked" left="883" top="427" right="928" bottom="469">
      <value>checked</value>
    </checkmark>
    <checkmark id="not checked" left="884" top="511" right="928" bottom="554">
      <value>unchecked</value>
    </checkmark>
    <barcode id="leftBarcode" left="46" top="1048" right="128" bottom="1350">
      <value encoding="Base64">QkYxMDExNQ==</value>
    </barcode>
  </page>
</document>

I need only the elements from it I need to create a new object with as many number of elements. something like this:

class result{
string first;
string second;
}

the result should only include the value of the value element.

I tried everything but I just can't seem to understand how to do it... (to be honest it's the first time I am dealing with XML files...)

any suggestions on how to parser the XML file?

o2887
  • 63
  • 1
  • 9

2 Answers2

1

Create yourself a C# class that has the properties you need. Note that you will need separate classes to represent the nested elements and attributes. Then you use the XMLSerializer to deserialize your XML, similar to this:

public static YourClass FromXmlString(string xmlString)
{
    var reader = new StringReader(xmlString);
    var serializer = new XmlSerializer(typeof(YourClass));
    return (YourClass)serializer.Deserialize(reader);
}

You say you only want some of the fields; it's a while since I have done this but I believe that all properties missing from you C# class will simply be ignored. Also, you can map elements of your XML to differently named C# properties using the XMLElement attribute:

 [XmlElement("some-element-name")]
 public string MyProperty { get; set; }

The attributes you can also map using the XMLAttribute attribute:

[Serializable]
[XMLElement("page"]
public sealed class Page
{
    [XmlAttribute("index")]
    public int Index { get; set; }
 }

See here for an example: XML string deserialization into c# object

Rob Kent
  • 5,183
  • 4
  • 33
  • 54
0

Lets start in the best way possible, one piece at a time. I'll show you how to create your variant class.

public class XVariant
{
    public readonly XElement self;
    public XVariant(XElement element = null) 
    { 
        self = element ?? new XElement("variant");
    }

    public int CharConfidence
    {
        get { return (int)self.Attribute("charConfidence"); }
        set
        {
            XAttribute cc = self.Attribute("charConfidence");
            if (cc == null)
                self.Add(cc = new XAttribute("charConfidence"));
            cc.Value = value.ToString();
        }
    }

    public string Value
    {
        get { return self.Value; }
        set { self.Value = value; }
    }
}

Now we can read all the variant elements with:

XElement root = XElement.Load(file); // or .Parse(string)
List<XVariant> variants = root.Descentants("variant")
                              .Select(x => new XVariant(x))
                              .ToList();
foreach(XVariant variant in variants)
{
    Console.WriteLine(variant.CharConfidence.ToString() + " " + variant.Value);
}

As you can see, it is relatively simple to write one class(element) at a time when you break it down like this. Depending on your purposes, as I wrote it so that you can add/change values as well.

Chuck Savage
  • 11,775
  • 6
  • 49
  • 69