6

I asked a question earlier and I got the tip to use XML deserialization to parse my XML contents to c# objects. After some googling and messing around I got a working deserialization, but I have a question.

My XML file looks like this: (This is just a part of the file)

   <osm>
      <n id="2638006578" l="5.9295547" b="52.5619519" />
      <n id="2638006579" l="5.9301973" b="52.5619526" />
      <n id="2638006581" l="5.9303625" b="52.5619565" />
      <n id="2638006583" l="5.9389539" b="52.5619577" />
      <n id="2638006589" l="5.9386643" b="52.5619733" />
      <n id="2638006590" l="5.9296231" b="52.5619760" />
      <n id="2638006595" l="5.9358987" b="52.5619864" />
      <n id="2638006596" l="5.9335913" b="52.5619865" />
      <w id="453071384">
        <nd rf="2638006581" />
        <nd rf="2638006590" />
        <nd rf="2638006596" />
        <nd rf="2638006583" />
        <nd rf="2638006578" />
      </w>
      <w id="453071385">
        <nd rf="2638006596" />
        <nd rf="2638006578" />
        <nd rf="2638006581" />
        <nd rf="2638006583" />
      </w>
   </osm>

I managed to deserialize this into objects and it works like it should. The problem is as follows: The <nd> elements under the <w> elements have a reference ID which is the same as an ID from a <n> element. It's possible that multiple <w> elements have the same <n> element reference, hence the seperate <n> element.

Currently code-wise I have a NodeReference object that represents the <nd> elements, but I want to directly link the Way class to the Node class based on the reference ID and the Node ID. So basically the Way class should have a List of Nodes rather than a list of NodeReferences. I should have a seperate list of nodes aswell to prevent Ways from having new instances with the same data (e.g. If two Ways have a reference to the same Node they also should point to the same Node instance rather than two identical Node instances, if that makes sense..)

I basically need to access the Lon/Lat/ID fields from a Node instance based on the NodeReference ID.

Here's my code:

DataCollection class

[XmlRoot("osm")]
    public class DataCollection {

        [XmlElement("n")]
        public List<Node> Nodes { get; private set; }

        [XmlElement("w")]
        public List<Way> Ways { get; private set; }

        public DataCollection() {
            this.Nodes = new List<Node>();
            this.Ways = new List<Way>();
        }
    }

The node class

[Serializable()]
public class Node {

    [XmlAttribute("id", DataType = "long")]
    public long ID { get; set; }

    [XmlAttribute("w", DataType = "double")]
    public double Lat { get; set; }

    [XmlAttribute("l", DataType = "double")]
    public double Lon { get; set; }
}

Way

[Serializable()]
    public class Way {

        [XmlAttribute("id", DataType = "long")]
        public long ID { get; set; }

        [XmlElement("nd")]
        public List<NodeReference> References { get; private set; }

        public Way() {
            this.References = new List<NodeReference>();
        }
    }

NodeReference

[Serializable()]
    public class NodeReference {

        [XmlAttribute("rf", DataType = "long")]
        public long ReferenceID { get; set; }
    }

Reading the XML file

public static void Read() {
        XmlSerializer serializer = new XmlSerializer(typeof(DataCollection));
        using (FileStream fileStream = new FileStream(@"path/to/file.xml", FileMode.Open)) {
            DataCollection result = (DataCollection)serializer.Deserialize(fileStream);
            // Example Requested usage: result.Ways[0].Nodes 
        }

        Console.Write("");
    }

Thanks in advance! If you have questions or answers please do let me know!

Community
  • 1
  • 1
Dubb
  • 423
  • 1
  • 7
  • 21

2 Answers2

3

I agree with @AlexanderPetrov, it's easiest to link the objects after deserialization. I'm using a dictionary for fast lookup and an additional Node property on the NodeReferece class.

var NodeById = this.Nodes.ToDictionary(n => n.ID, n => n);
foreach (var way in this.Ways) {
    foreach (var nd in way.References) {
        nd.Node = NodeById[nd.ReferenceID];
    }
}

The following code is runnable in LINQPad after importing the System.Xml.Serialization namespace via Query Properties.

void Main()
{
    XmlSerializer serializer = new XmlSerializer(typeof(DataCollection));

    using (FileStream fileStream = new FileStream(@"file.xml", FileMode.Open)) {
        DataCollection result = (DataCollection)serializer.Deserialize(fileStream);
        result.Index();

        result.Ways[0].References[0].Node.Lon.Dump();
        // -> 5,9303625
    }
}


// --------------------------------------------------------------------------
[XmlRoot("osm")]
public class DataCollection {
    [XmlElement("n")]
    public List<Node> Nodes = new List<Node>();

    [XmlElement("w")]
    public List<Way> Ways = new List<Way>();

    public void Index() {
        var NodeById = this.Nodes.ToDictionary(n => n.ID, n => n);
        foreach (var way in this.Ways) {
            foreach (var nd in way.References) {
                nd.Node = NodeById[nd.ReferenceID];
            }
        }
    }
}

// --------------------------------------------------------------------------
[Serializable()]
public class Node {
    [XmlAttribute("id", DataType = "long")]
    public long ID { get; set; }

    [XmlAttribute("w", DataType = "double")]
    public double Lat { get; set; }

    [XmlAttribute("l", DataType = "double")]
    public double Lon { get; set; }
}

// --------------------------------------------------------------------------
[Serializable()]
public class Way {
   [XmlAttribute("id", DataType = "long")]
   public long ID { get; set; }

   [XmlElement("nd")]
   public List<NodeReference> References = new List<NodeReference>();
}

// --------------------------------------------------------------------------
[Serializable()]
public class NodeReference {
    [XmlAttribute("rf", DataType = "long")]
    public long ReferenceID { get; set; }

    [XmlIgnore]
    public Node Node { get; set; }
}
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Apparently this is way faster than plainly looping. Took me ~1.6S to fetch data from 500K lines. I guess this will be my fix! Thanks man! – Dubb Dec 08 '16 at 14:24
  • I've tested this code against a relatively large XML file (~25,000 `` elements and a few thousand copies of `` with 5 nodes references each, 1,5 MB XML overall. LINQPad shows execution times well below 100ms for me. YMMV. – Tomalak Dec 08 '16 at 14:29
2

I guess it is impossible to do using standard XmlSerializer.

You can try to do this by implementing the IXmlSerializable interface or using a custom XmlReader.

However, the easiest way to do it, manually filling in the desired collection with the following code after deserializing:

DataCollection result = (DataCollection)serializer.Deserialize(fileStream);

foreach (var way in result.Ways)
    foreach (var nodeReference in way.References)
        way.Nodes.Add(result.Nodes.First(node => node.ID == nodeReference.ReferenceID));

Add Nodes property to Way class:

public class Way
{
    [XmlAttribute("id", DataType = "long")]
    public long ID { get; set; }

    [XmlElement("nd")]
    public List<NodeReference> References { get; private set; }

    public List<Node> Nodes { get; set; }

    public Way()
    {
        this.References = new List<NodeReference>();
    }
}
Alexander Petrov
  • 13,457
  • 2
  • 20
  • 49
  • My XML files are very large the current one I'm testing with is 500K lines. If I apply O(N2) complexity to this software it would be very slow. – Dubb Dec 08 '16 at 14:12