0

What I want to do

I have a folder. In that folder there are pdfs, pictures, etc. Additionally there is an xmlfile.
That xmlfile has metadata for each other file.
I want to extract that data from the xml and save it in an c# class so I can use it later

What I already did

I searched for a way to parse the file using linq. But I couldn't get it to work the way I want to.
I want it to work like this:
I have a list of files stored in my application. Then I want to do a loop over each file and get the data for that file from the xml.

What I have

The xmlfile looks like this:

<?xml version='1.0' encoding='ISO-8859-1' ?>
<FOLDERS Name="XXXXXXX" >
    <FOLDER Date="12/15/2015 15:25:04" ByUser="" Name="some folders name" Type="" MemberOf="">
        <![CDATA[FOLDERID111]]>
        <VISUALFOLDER Date="02/16/2016 14:25:00" ByUser="" Name="some folders name" Type="" StartView="UNKNOWN" ScreenOffset="0"/>
        <TABSHEET Date="02/16/2016 14:25:00" Name="Fields" Type="IdxFields">
            <![CDATA[TABSHEETID521]]>
            <VISUALTABSHEET Date="02/16/2016 14:25:00" Name="Fields" Type="IdxFields"/>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="DocuName">
                <![CDATA[Something thats not the documents name]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="DocuName"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="DocuDate">
                <![CDATA[09.12.2015]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="DocuDate"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="Object">
                <![CDATA[OBJECT1]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="Object"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="Tag">
                <![CDATA[LETTER]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="Tag"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="User">
                <![CDATA[USER1]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="User"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="Note">
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="Note"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="Barcode">
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="Barcode"/>
            </INDEXFIELD>
        </TABSHEET>
        <TABSHEET Date="02/16/2016 14:25:00" Name="Documents" Type="Documents" Data="" SeqNo="0" Title="" Password="">
            <![CDATA[TABSHEETID522]]>
            <VISUALTABSHEET Date="02/16/2016 14:25:00" Name="Documents" Type="Documents"/>
            <DOCUMENT Date="02/16/2016 14:25:00" Name="Document" Type="" Data="" FileName="C:\ProgramData\Import\file1.pdf" FileOffset="5712054" FileSize="128509" BinaryType="PDF">
                <VISUALDOCUMENT Date="02/16/2016 14:25:00" Name="Document" Type="" Height="148" Width="105"/>
            </DOCUMENT>
            <DOCUMENT Date="02/16/2016 14:25:00" Name="Document" Type="" Data="" FileName="C:\ProgramData\Import\file2.pdf" FileOffset="5840563" FileSize="129847" BinaryType="PDF">
                <VISUALDOCUMENT Date="02/16/2016 14:25:00" Name="Document" Type="" Height="148" Width="105"/>
            </DOCUMENT>
        </TABSHEET>
    </FOLDER>

<FOLDER Date="12/30/2015 15:25:04" ByUser="" Name="some other folders name" Type="" MemberOf="">
        <![CDATA[FOLDERID111]]>
        <VISUALFOLDER Date="02/16/2016 14:25:00" ByUser="" Name="some other folders name" Type="" StartView="UNKNOWN" ScreenOffset="0"/>
        <TABSHEET Date="02/16/2016 14:25:00" Name="Fields" Type="IdxFields">
            <![CDATA[TABSHEETID521]]>
            <VISUALTABSHEET Date="02/16/2016 14:25:00" Name="Fields" Type="IdxFields"/>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="DocuName">
                <![CDATA[Something thats not the documents name]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="DocuName"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="DocuDate">
                <![CDATA[09.12.2015]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="DocuDate"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="Object">
                <![CDATA[OBJECT1]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="Object"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="Tag">
                <![CDATA[LETTER]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="Tag"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="User">
                <![CDATA[USER1]]>
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="User"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="Note">
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="Note"/>
            </INDEXFIELD>
            <INDEXFIELD Date="02/16/2016 14:25:00" Name="Barcode">
                <VISUALINDEXFIELD Date="02/16/2016 14:25:00" Name="Barcode"/>
            </INDEXFIELD>
        </TABSHEET>
        <TABSHEET Date="02/16/2016 14:25:00" Name="Documents" Type="Documents" Data="" SeqNo="0" Title="" Password="">
            <![CDATA[TABSHEETID522]]>
            <VISUALTABSHEET Date="02/16/2016 14:25:00" Name="Documents" Type="Documents"/>
            <DOCUMENT Date="02/16/2016 14:25:00" Name="Document" Type="" Data="" FileName="C:\ProgramData\Import\file3.pdf" FileOffset="5712054" FileSize="128509" BinaryType="PDF">
                <VISUALDOCUMENT Date="02/16/2016 14:25:00" Name="Document" Type="" Height="148" Width="105"/>
            </DOCUMENT>
        </TABSHEET>
    </FOLDER>
</FOLDERS>

The xml is generated by another application.
Each "Folder" has two "TABSHEET"s. One includes the data (identifiable by the "Name" attribute) and another includes the Filenames.
The data is included in an CDATA-Block. Some fields have data, some not. Not every Document has a "Barcode".

My Question

How does a Linq query look like that does what I want to do?

Update 1

Ok, I fixed my query to almost do what I want

            var test1 = xdoc
                .Element("FOLDERS")
                .Elements("FOLDER")
                .Where(xml => xml
                                .Elements("TABSHEET")
                                .Elements("DOCUMENT")
                                .Select(x => x.Attribute("FileName").Value)
                                .ToList()
                                .Contains(file.FilePath)
                )
                .Select(xml => xml
                                .Elements("TABSHEET")
                                .Elements("INDEXFIELD")
                                .Where(x => 
                                    x.Attribute("Name").Value == "DocuName" ||
                                    x.Attribute("Name").Value == "Note" ||
                                    x.Attribute("Name").Value == "User")
                                .Select(x => (string)x.Value)
                );

The only Problem now is how to differenciate the result.
What I means is this: The query will return a IEnumerable> containing 3 values times the amount files. But because its a IEnumerable I cant tell if the string is "DocuName" or "Note" or "User".

Is there a way to get a Dictionary with the right Keys out of this query?

Simon Balling
  • 481
  • 1
  • 5
  • 14
  • Show us the linq query you have that doesn't work – Jeremy Thompson Feb 17 '16 at 08:58
  • _"But I couldn't get it to work the way I want to....I want to do a loop over each file and get the data for that file from the xml"_ - too vague and also too improbable –  Feb 17 '16 at 09:04
  • 1
    If using linq is not a must, you can use xsd.exe to generate a class structure which repsents your xml. Then deserializing to this class and accessing data would be fairly easy. – Abdullah Nehir Feb 17 '16 at 09:08
  • By the way, there is an answer about xsd.exe http://stackoverflow.com/questions/4203540/generate-c-sharp-class-from-xml – Abdullah Nehir Feb 17 '16 at 09:09

2 Answers2

0

There are many approaches you can solve this problem, since you mentioned you want equivalent C# entities I prefer this approach.

Generate C# entities for your xml (there are plenty of tools)

[XmlRoot(ElementName="VISUALFOLDER")]
public class VISUALFOLDER {
    [XmlAttribute(AttributeName="Date")]
    public string Date { get; set; }
    [XmlAttribute(AttributeName="ByUser")]
    public string ByUser { get; set; }
    [XmlAttribute(AttributeName="Name")]
    public string Name { get; set; }
    [XmlAttribute(AttributeName="Type")]
    public string Type { get; set; }
    [XmlAttribute(AttributeName="StartView")]
    public string StartView { get; set; }
    [XmlAttribute(AttributeName="ScreenOffset")]
    public string ScreenOffset { get; set; }
}

[XmlRoot(ElementName="VISUALTABSHEET")]
public class VISUALTABSHEET {
    [XmlAttribute(AttributeName="Date")]
    public string Date { get; set; }
    [XmlAttribute(AttributeName="Name")]
    public string Name { get; set; }
    [XmlAttribute(AttributeName="Type")]
    public string Type { get; set; }
}

[XmlRoot(ElementName="VISUALINDEXFIELD")]
public class VISUALINDEXFIELD {
    [XmlAttribute(AttributeName="Date")]
    public string Date { get; set; }
    [XmlAttribute(AttributeName="Name")]
    public string Name { get; set; }
}

[XmlRoot(ElementName="INDEXFIELD")]
public class INDEXFIELD {
    [XmlElement(ElementName="VISUALINDEXFIELD")]
    public VISUALINDEXFIELD VISUALINDEXFIELD { get; set; }
    [XmlAttribute(AttributeName="Date")]
    public string Date { get; set; }
    [XmlAttribute(AttributeName="Name")]
    public string Name { get; set; }
}

[XmlRoot(ElementName="TABSHEET")]
public class TABSHEET {
    [XmlElement(ElementName="VISUALTABSHEET")]
    public VISUALTABSHEET VISUALTABSHEET { get; set; }
    [XmlElement(ElementName="INDEXFIELD")]
    public List<INDEXFIELD> INDEXFIELD { get; set; }
    [XmlAttribute(AttributeName="Date")]
    public string Date { get; set; }
    [XmlAttribute(AttributeName="Name")]
    public string Name { get; set; }
    [XmlAttribute(AttributeName="Type")]
    public string Type { get; set; }
    [XmlElement(ElementName="DOCUMENT")]
    public List<DOCUMENT> DOCUMENT { get; set; }
    [XmlAttribute(AttributeName="Data")]
    public string Data { get; set; }
    [XmlAttribute(AttributeName="SeqNo")]
    public string SeqNo { get; set; }
    [XmlAttribute(AttributeName="Title")]
    public string Title { get; set; }
    [XmlAttribute(AttributeName="Password")]
    public string Password { get; set; }
}

[XmlRoot(ElementName="VISUALDOCUMENT")]
public class VISUALDOCUMENT {
    [XmlAttribute(AttributeName="Date")]
    public string Date { get; set; }
    [XmlAttribute(AttributeName="Name")]
    public string Name { get; set; }
    [XmlAttribute(AttributeName="Type")]
    public string Type { get; set; }
    [XmlAttribute(AttributeName="Height")]
    public string Height { get; set; }
    [XmlAttribute(AttributeName="Width")]
    public string Width { get; set; }
}

[XmlRoot(ElementName="DOCUMENT")]
public class DOCUMENT {
    [XmlElement(ElementName="VISUALDOCUMENT")]
    public VISUALDOCUMENT VISUALDOCUMENT { get; set; }
    [XmlAttribute(AttributeName="Date")]
    public string Date { get; set; }
    [XmlAttribute(AttributeName="Name")]
    public string Name { get; set; }
    [XmlAttribute(AttributeName="Type")]
    public string Type { get; set; }
    [XmlAttribute(AttributeName="Data")]
    public string Data { get; set; }
    [XmlAttribute(AttributeName="FileName")]
    public string FileName { get; set; }
    [XmlAttribute(AttributeName="FileOffset")]
    public string FileOffset { get; set; }
    [XmlAttribute(AttributeName="FileSize")]
    public string FileSize { get; set; }
    [XmlAttribute(AttributeName="BinaryType")]
    public string BinaryType { get; set; }
}

[XmlRoot(ElementName="FOLDER")]
public class FOLDER {
    [XmlElement(ElementName="VISUALFOLDER")]
    public VISUALFOLDER VISUALFOLDER { get; set; }
    [XmlElement(ElementName="TABSHEET")]
    public List<TABSHEET> TABSHEET { get; set; }
    [XmlAttribute(AttributeName="Date")]
    public string Date { get; set; }
    [XmlAttribute(AttributeName="ByUser")]
    public string ByUser { get; set; }
    [XmlAttribute(AttributeName="Name")]
    public string Name { get; set; }
    [XmlAttribute(AttributeName="Type")]
    public string Type { get; set; }
    [XmlAttribute(AttributeName="MemberOf")]
    public string MemberOf { get; set; }
}

[XmlRoot(ElementName="FOLDERS")]
public class FOLDERS {
    [XmlElement(ElementName="FOLDER")]
    public List<FOLDER> FOLDER { get; set; }
    [XmlAttribute(AttributeName="Name")]
    public string Name { get; set; }
}

Now we can deserialize this using below snippet.

StreamReader reader = new StreamReader(filepath);
var folders = (FOLDERS)serializer.Deserialize(reader); 

Working Demo

Hari Prasad
  • 16,716
  • 4
  • 21
  • 35
0

I came up with the following solution:

file is a class where I save all data of a file.

var elements = xdoc.Element("FOLDERS");
            if (elements == null)
            {
                throw new KeyNotFoundException();
            }

            var data = elements
                .Elements("FOLDER")
                .Where(xml => xml
                                .Elements("TABSHEET")
                                .Elements("DOCUMENT")
                                .Select(x => x.Attribute("FileName").Value)
                                .ToList()
                                .Contains(file.FileName)
                )
                .Select(xml => xml
                                .Elements("TABSHEET")
                                .Elements("INDEXFIELD")
                                .Where(x =>
                                    x.Attribute("Name").Value == "Date" ||
                                    x.Attribute("Name").Value == "Note" 
                                    )
                                .Select(x => new string[] { (string)x.Attribute("Name"), (string)x.Value }))
                .ToList();

            if (data.Count != 1)
            {
                file.Upload = false;
                continue;
            }

            var dataDictionary = data[0].ToDictionary(item => item[0],
                item => item[1]);

            file.Date = !dataDictionary.ContainsKey("Date") || string.IsNullOrWhiteSpace(dataDictionary["Date"]) ? new DateTime() : DateTime.Parse(dataDictionary["Date"]);
Simon Balling
  • 481
  • 1
  • 5
  • 14