14

I need to Convert a CSV into an XML document. The examples I have seen so far, all show how to do this with a fixed number of columns in the CSV.

I have this so far, using LINQ:

String[] File = File.ReadAllLines(@"C:\text.csv");

        String xml = "";

        XElement top = new XElement("TopElement",

        from items in File

        let fields = items.Split(';')

        select new XElement("Item",

        new XElement("Column1", fields[0]),

        new XElement("Column2", fields[1]),

        new XElement("Column3", fields[2]),

        new XElement("Column4", fields[3]),

        new XElement("Column5", fields[4])

        )

        );

        File.WriteAllText(@"C:\xmlout.xml", xml + top.ToString());

This is for a fixed amount of columns, but my .CSV has a different number of columns on each line.

How would you fit some sort of loop into this, depending on how many words (columns) there are in each line of the .CSV?

Thnx

Soeren
  • 1,023
  • 4
  • 22
  • 41

5 Answers5

34
var lines = File.ReadAllLines(@"C:\text.csv");

var xml = new XElement("TopElement",
   lines.Select(line => new XElement("Item",
      line.Split(';')
          .Select((column, index) => new XElement("Column" + index, column)))));

xml.Save(@"C:\xmlout.xml");

Input:

A;B;C
D;E;F
G;H

Output:

<TopElement>
  <Item>
    <Column0>A</Column0>
    <Column1>B</Column1>
    <Column2>C</Column2>
  </Item>
  <Item>
    <Column0>D</Column0>
    <Column1>E</Column1>
    <Column2>F</Column2>
  </Item>
  <Item>
    <Column0>G</Column0>
    <Column1>H</Column1>
  </Item>
</TopElement>
dtb
  • 213,145
  • 36
  • 401
  • 431
  • Pretty close. But I need to split eveery word on each line seperated by a ; So, the output XML would look something like this: word word word word word word word I almost had it with the code I posted, but I only got the first word in each line. So I need to add some sort of loop, that adds a Column element for each word on each line seperated by a ; – Soeren Jun 18 '10 at 13:13
  • @Soeren: Isn't that what my solution does? I've added an example for clarification. – dtb Jun 18 '10 at 18:11
  • This works great. The line.Split(';') part wasn't in the first example, and I tried to add it myself. I just didn't know how to add it. I need to study this LINQ stuff some more. Thanks for your help. – Soeren Jun 18 '10 at 19:06
  • Great post...used bits and pieces for mine and it worked flawlessly – afreeland Mar 26 '13 at 19:41
  • It saved me a lot time. Thanks for this great solution! – Gaurang Jadia Nov 01 '13 at 17:22
  • 2
    Responding a little late, but if your columns can contain the delimiter within a quoted column, be aware that string .split() doesn't pay attention to the quoted strings. Example: var x = "\"Quoted;string\""; var words = x.Split(';'); words[0]; // "Quoted words[1]; // "string" So, if your delimiter is a common character, you will have extra columns of data. – Oblivion2000 Mar 17 '16 at 13:18
  • @dtb if i need `"A;B"` as single element – Meer Jul 29 '16 at 08:19
  • Hi @dtb, Great solution. But I was wondering how do we add the columns as self closing tag elements so they would be `` instead of `D`. Would really appreciate your help. Tried to look around and this seems like the best solution but need to figure out how to make it self closing. – Jay Oct 28 '16 at 15:48
12

In case you want use the headers as the elements names:

var lines = File.ReadAllLines(@"C:\text.csv");
string[] headers = lines[0].Split(',').Select(x => x.Trim('\"')).ToArray();

var xml = new XElement("TopElement",
   lines.Where((line, index) => index > 0).Select(line => new XElement("Item",
      line.Split(',').Select((column, index) => new XElement(headers[index], column)))));

xml.Save(@"C:\xmlout.xml");
Vlax
  • 1,447
  • 1
  • 18
  • 24
4

I wrote a class that derives from Vlax's snippet. In addition I have provided a unit test to document the workflow.

Unit Test:

[TestMethod]
public void convert_csv_to_xml()
{
    // Setup
    var csvPath = @"Testware\vendor.csv";
    var xmlPath = @"Testware\vendor.xml";

    // Test
    var success = DocumentConverter.Instance.CsvToXml(csvPath, xmlPath);

    // Verify
    var expected = File.Exists(xmlPath) && success;
    Assert.AreEqual(true, expected);
}

CSV to XML:

public class DocumentConverter
{
    #region Singleton
    static DocumentConverter _documentConverter = null;

    private DocumentConverter() { }

    public static DocumentConverter Instance
    {
        get
        {
            if (_documentConverter == null)
            {
                _documentConverter = new DocumentConverter();
            }

            return _documentConverter;
        }
    }
    #endregion

    public bool CsvToXml(string sourcePath, string destinationPath)
    {
        var success = false;

        var fileExists = File.Exists(sourcePath);

        if (!fileExists)
        {
            return success;
        }

        var formatedLines = LoadCsv(sourcePath);
        var headers = formatedLines[0].Split(',').Select(x => x.Trim('\"').Replace(" ", string.Empty)).ToArray();

        var xml = new XElement("VendorParts",
           formatedLines.Where((line, index) => index > 0).
               Select(line => new XElement("Part",
                  line.Split(',').Select((field, index) => new XElement(headers[index], field)))));

        try
        {
            xml.Save(destinationPath);

            success = true;
        }
        catch (Exception ex)
        {
            success = false;

            var baseException = ex.GetBaseException();
            Debug.Write(baseException.Message);
        }

        return success;
    }

    private List<string> LoadCsv(string sourcePath)
    {
        var lines = File.ReadAllLines(sourcePath).ToList();

        var formatedLines = new List<string>();

        foreach (var line in lines)
        {
            var formatedLine = line.TrimEnd(',');
            formatedLines.Add(formatedLine);
        }
        return formatedLines;
    }
}

NOTE:

I extended Vlax's solution by removing a trailing comma for each of the CSV line entries that caused a runtime exception based on an index being out of bounds in relation to the column header.

Scott Nimrod
  • 11,206
  • 11
  • 54
  • 118
1

Cinchoo ETL - an open source library available to do the conversion of CSV to Xml easily with few lines of code

For a sample CSV:

string csv = @"Id, Name, City
1, Tom, NY
2, Mark, NJ
3, Lou, FL
4, Smith, PA
5, Raj, DC
";

StringBuilder sb = new StringBuilder();

using (var p = ChoCSVReader.LoadText(csv)
    .WithFirstLineHeader()
    )
{
    using (var w = new ChoXmlWriter(sb)
        .Configure(c => c.RootName = "Employees")
        .Configure(c => c.NodeName = "Employee")
        )
        w.Write(p);
}

Console.WriteLine(sb.ToString());

Output Xml:

<Employees>
  <Employee>
    <Id>1</Id>
    <Name>Tom</Name>
    <City>NY</City>
  </Employee>
  <Employee>
    <Id>2</Id>
    <Name>Mark</Name>
    <City>NJ</City>
  </Employee>
  <Employee>
    <Id>3</Id>
    <Name>Lou</Name>
    <City>FL</City>
  </Employee>
  <Employee>
    <Id>4</Id>
    <Name>Smith</Name>
    <City>PA</City>
  </Employee>
  <Employee>
    <Id>5</Id>
    <Name>Raj</Name>
    <City>DC</City>
  </Employee>
</Employees>

Checkout CodeProject article for some additional help.

Disclaimer: I'm the author of this library.

Cinchoo
  • 6,088
  • 2
  • 19
  • 34
  • My CSV data is surrounded by quotes, ie: "Value 1", "Value 2" etc, and the Cinchoo output preserves the quotes, ie: "Value 1""Value 2", how do I get it to suppress the quotes, I'd like it to output Value 1Value 2. – Chris Feb 07 '22 at 19:52
  • Use `MayHaveQuotedFields()` to reader. – Cinchoo Feb 07 '22 at 21:37
0

Here provides a solution without using nested LINQ, simpler to understand.

  • uses Linq to Xml.
  • supports different delimiter (const)
  • support each row has different slices

The content of input.csv:

A,B,C
D,E,F
G,H

The code for process:

Program.cs

using System;
using System.IO;
using System.Linq;
using System.Xml.Linq;

namespace CSVtoXML
{
    class Program
    {
        private static void AddContentForEachLine(string line, ref XElement xmlTree)
        {
            var currentTree = new XElement("Item");
            const string delimiter = ",";     // Can be changed based on the actual situation
            string[] slices = line.Split(delimiter);

            for (int i = 0; i < slices.Count(); i++)
                currentTree.Add(new XElement($"Column{i}", slices[i].ToString()));

            xmlTree.Add(currentTree);
        }

        static void Main(string[] args)
        {
            var basePath = Environment.CurrentDirectory;
            var lines = File.ReadAllLines(Path.Combine(basePath, "../../..", @"input.csv"));

            var xmlTree = new XElement("TopElement");

            foreach (var line in lines)
            {
                AddContentForEachLine(line, ref xmlTree);
            }

            xmlTree.Save(Path.Combine(basePath, "../../..", @"output.xml"));
        }
    }
}

After running the code, the result is below:

<?xml version="1.0" encoding="utf-8"?>
<TopElement>
  <Item>
    <Column0>A</Column0>
    <Column1>B</Column1>
    <Column2>C</Column2>
  </Item>
  <Item>
    <Column0>D</Column0>
    <Column1>E</Column1>
    <Column2>F</Column2>
  </Item>
  <Item>
    <Column0>G</Column0>
    <Column1>H</Column1>
  </Item>
</TopElement>


The complete visual studio solution for this code can be seen here: https://github.com/yanglr/dotnetInterview/tree/master/CSVtoXML.

Bravo Yeung
  • 8,654
  • 5
  • 38
  • 45