How to check the values of attributes are in ascending order and also find duplicates?

Question

Here is a sample xml

<?xml version="1.0"?>
<catalog>
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <genre>Computer</genre>
        <price>44.95</price>
        <publish_date>2000-10-01</publish_date>
        <description>An in-depth look at creating applications
        with XML.</description>
    </book>
    <book id="bk102">
        <author>Ralls, Kim</author>
        <title>Midnight Rain</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <publish_date>2000-12-16</publish_date>
        <description>A former architect battles corporate zombies,
            an evil sorceress, and her own childhood to become queen
        of the world.</description>
    </book>
    <book id="bk102">
        <author>Corets, Eva</author>
        <title>Maeve Ascendant</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <publish_date>2000-11-17</publish_date>
        <description>After the collapse of a nanotechnology
            society in England, the young survivors lay the
        foundation for a new society.</description>
    </book>
    <book id="bk103">
        <author>Corets, Eva</author>
        <title>Oberon's Legacy</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <publish_date>2001-03-10</publish_date>
        <description>In post-apocalypse England, the mysterious
            agent known only as Oberon helps to create a new life
            for the inhabitants of London. Sequel to Maeve
        Ascendant.</description>
    </book>
</catalog>

How do I check whether or not the values of the attribute id in the nodes <book> are in ascending order, also find if there are duplicate values in it in the simplest possible way. I did

static void Main(string[] args)
{

    XDocument myfile = XDocument.Parse(File.ReadAllText(@"D:\sample_xml.xml"));
    var check = myfile.Descendants("book").Select(a => a.Attribute("id").Value.Substring(2)).ToArray();

    if (IsSortedAscending(check))
    {
        Console.WriteLine("Sorted in Ascending order");
    }
    else
    {
        Console.WriteLine("Check Sequence");
    }

    Console.ReadLine();
}


public static bool IsSortedAscending(string[] arr)
{
    for (int i = arr.Length - 2; i >= 0; i--)
    {
        if (arr[i].CompareTo(arr[i + 1]) > 0)
        {
            return false;
        }
    }
    return true;
}

But it doesn't account for duplicate values...How do I do that?

Also, is it possible to find the missing value(if any) in the attribute id, e.x. if there is bk109 and the next one is bk112 then the program will show that bk110 and bk111 are missing.

use XML deserialization to load xml data on to objects and then use LINQ. https://stackoverflow.com/questions/18340427/read-from-xml-file-into-c-sharp-class-using-serialization — Rudresha Parameshappa, Apr 22 '18 at 07:17
@RudreshaParameshappa: Why bother deserialing? LINQ to XML makes this trivial. — Jon Skeet, Apr 22 '18 at 07:18

Jon Skeet · Accepted Answer · 2018-04-22T07:42:14.290

You're nearly there already - the only difference between "strictly ascending, no duplicates" and "ascending, allowing duplicates" is what you do when the result of the comparison is 0 (i.e. the value is the same as the previous one).

You just need to change your IsSortedAscending method to return false if the result of the comparison is >= 0 rather than just > 0:

public static bool IsSortedAscending(string[] arr)
{
    for (int i = arr.Length - 2; i >= 0; i--)
    {
        // Fail if this ID is equal to or bigger than the next one.
        if (arr[i].CompareTo(arr[i + 1]) >= 0)
        {
            return false;
        }
    }
    return true;
}

(You could also use Skip and Zip as an alternative way of comparing elements pairwise, but that's a slightly different matter.)

Note that currently your code may fail if your numbers are of different lengths. For example, consider IDs "bk99" and "bk100". That will compare "99" with "100" as strings and decide that "99" comes after "100".

If your IDs are always really "bk" followed by an integer, I would parse them early:

var ids = myfile.Descendants("book")
                .Select(a => a.Attribute("id").Value.Substring(2))
                .Select(id => int.Parse(id))
                .ToArray();

You'd then change your method to accept an int[] instead of a string[].

At that point, it's much easier to check for "missing" IDs too - in string form, there's no real concept of a "missing" ID, as you could have "bk101", "bk101a", "bk101c" - is "bk101b" missing there? If so, what about "bk101aa"? With integers, it's much simpler.

Once you've got your array of integer IDs, you can use the length of the array to check whether any values are missing:

if (ids.Length > 0 ids.Length - 1 != ids.Last() - ids.First())
{
    Console.WriteLine("At least one ID is missing");
}

That won't tell you which ID is missing, admittedly.

@TamalBanerjee: Ah, I'd missed that part. I'll edit - along with an aspect of numbers... — Jon Skeet, Apr 22 '18 at 07:37

score -1 · Answer 2 · answered Apr 22 '18 at 08:23

-1

I would just sort elements and put into a dictionary :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            XDocument doc = XDocument.Load(FILENAME);

            XElement catalog = doc.Root;

            Dictionary<string, List<XElement>> dict = catalog.Elements("book")
                .OrderBy(x => (string)x.Attribute("id"))
                .ThenBy(x => (DateTime)x.Element("publish_date"))
                .GroupBy(x => (string)x.Attribute("id"), y => y)
                .ToDictionary(x => x.Key, y => y.ToList());
        }
    }
}

answered Apr 22 '18 at 08:23

jdweng

33,250
2
15
20

why did you use "publish_date"? I only need to query the `book` nodes and nothing else – Tamal Banerjee Apr 22 '18 at 09:30
What are you calling a duplicate? Usually with duplicates you want to take latest so I sorted. – jdweng Apr 22 '18 at 09:45
I do not wish to update the file, I just want to check whether the values of `id` are in ascending order and whether there are any duplicate `id` values of the node `book`... – Tamal Banerjee Apr 22 '18 at 09:49

How to check the values of attributes are in ascending order and also find duplicates?

2 Answers2