1

The program should take csv file as input and output XML .The Code is as follows

    private static void ConvertCSVToXML()
    {
        string[] source = File.ReadAllLines("data.csv");
        string RootNameStartTag = "<" + Path.GetFileName("\\CSVTOXML\\CSV-XML\\bin\\Debug\\data.csv") + ">";
        RootNameStartTag = RootNameStartTag.Replace(".csv", "");
        string RootNameEndTag = RootNameStartTag.Insert(1, "/");
        StreamWriter writeFile = new StreamWriter("Output.xml");
        string[] headers = source[0].Split(',');
        source = source.Where(w => w != source[0]).ToArray();
        string[] fields = new string[] { };
        XElement xmlElement ;          
        for (int i = 0; i < source.Length; i++)
        {           

            writeFile.WriteLine(RootNameStartTag);

            fields = source[i].Split(',');                
            for (int j = 0; j < fields.Length; j++)
            {

                xmlElement = new XElement(new XElement(headers[j], fields[j]));
                writeFile.Write(xmlElement);
                writeFile.WriteLine();

            }
            writeFile.WriteLine(RootNameEndTag);
            fields = null;  
        }
    }

The Only Problem with the above code is that it splits data based on commas (,) so if i have row in csv as A,"DEF,XYZ,GHI","FDNFB,dfhjd"

then field[0]=A field[1]="DEF field[3]=XYZ field[4]=GHI" field[5]="FDNB field[6]=dfhjd" but i need the output as field[0]=A field[1]=DEF,XYZ,GHI field[2]=FDNFB,dfhjd Kindly Help Me to Split based on the above Pattern

John Saunders
  • 160,644
  • 26
  • 247
  • 397
Vinodh
  • 385
  • 1
  • 4
  • 15

6 Answers6

2

The TextFieldParser built into .Net handles fields with quotes. It's in the Microsoft.VisualBasic.FileIO but can be used from c# or any other .Net language. The following test code:

public static class TextFieldParserTest
{
    public static void Test()
    {
        var csv = @"""DEF,XYZ,GHI"",""FDNFB,dfhjd""";

        using (var stream = new StringReader(csv))
        using (TextFieldParser parser = new TextFieldParser(stream))
        {
            parser.SetDelimiters(new string[] { "," });
            parser.HasFieldsEnclosedInQuotes = true; // Actually already default

            while (!parser.EndOfData)
            {
                string[] fields = parser.ReadFields();
                Debug.WriteLine(fields.Length);
                foreach (var field in fields)
                    Debug.WriteLine(field); 
            }
        }
    }
}

Gives the following output:

2
DEF,XYZ,GHI
FDNFB,dfhjd
dbc
  • 104,963
  • 20
  • 228
  • 340
  • This is the best solution to process CSV file which include quotes and comma in the data....Thank a so much. I will use this parser to convert csv to xml. – Manjunath Patelappa Nov 10 '22 at 16:43
0

See the follwoing solution [ Convert CSV to XML when CSV contains both character and number data ]

He proposes using regular expressions to Parse a CSV line using SplitCSV(line) instead of line.Split(",")

Community
  • 1
  • 1
SomeCode.NET
  • 917
  • 15
  • 33
  • Using regular expressions to parse CSV files can be very slow with all the look ahead and look behind checking. Also I'm not sure how well they would deal with escaped quotes for example "This field has a quote ("") and a comma (,) in" – Martin Brown Dec 16 '14 at 16:34
  • @MartinBrown The proposed solution works correctly, i had the same problem and i used regular expressions, it's not slow as you think, even with big data ! – SomeCode.NET Dec 16 '14 at 16:48
  • Because I'm on my sick bed today and a bit bored I did some performance tests. 1 million parses of "text,\"text with quote(\"\") and comma (,)\",text". This takes 755ms with my function 2,087ms with Thorsten Dittmer's solution and 9,761ms with the Regex solution this answer points to. That is 12 times slower! (Times averaged over three runs on a I7 Surface 3). Even if you move the Regex instantiation out of the loop and reuse it you still only get a figure of 2,682ms. – Martin Brown Dec 17 '14 at 16:31
0

Cinchoo ETL - an open source library simplifies the process of CSV to Xml files conversion.

For a sample CSV:

Id, Name, City
1, Tom, NY
2, Mark, NJ
3, Lou, FL
4, Smith, PA
5, Raj, DC

Using the code below you can produce Xml

string csv = @"Id, Name, City
    1, Tom, NY
    2, Mark, NJ
    3, Lou, FL
    4, Smith, PA
    5, Raj, DC";

StringBuilder sb = new StringBuilder();
using (var p = ChoCSVReader.LoadText(csv).WithFirstLineHeader())
{
    using (var w = new ChoXmlWriter(sb)
        .Configure(c => c.RootName = "Emps")
        .Configure(c => c.NodeName = "Emp")
        )
    {
        w.Write(p);
    }
}

Console.WriteLine(sb.ToString());

Output Xml:

<Emps>
  <Emp>
    <Id>1</Id>
    <Name>Tom</Name>
    <City>NY</City>
  </Emp>
  <Emp>
    <Id>2</Id>
    <Name>Mark</Name>
    <City>NJ</City>
  </Emp>
  <Emp>
    <Id>3</Id>
    <Name>Lou</Name>
    <City>FL</City>
  </Emp>
  <Emp>
    <Id>4</Id>
    <Name>Smith</Name>
    <City>PA</City>
  </Emp>
  <Emp>
    <Id>5</Id>
    <Name>Raj</Name>
    <City>DC</City>
  </Emp>
</Emps>

Checkout CodeProject article for some additional help.

Disclaimer: I'm the author of this library.

Cinchoo
  • 6,088
  • 2
  • 19
  • 34
-1

This seems a good alternative and may sort your issue: http://msdn.microsoft.com/en-GB/library/bb387090.aspx

// Create the text file.
string csvString = @"GREAL,Great Lakes Food Market,Howard Snyder,Marketing Manager,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
HUNGC,Hungry Coyote Import Store,Yoshi Latimer,Sales Representative,(503) 555-6874,City Center Plaza 516 Main St.,Elgin,OR,97827,USA
LAZYK,Lazy K Kountry Store,John Steel,Marketing Manager,(509) 555-7969,12 Orchestra Terrace,Walla Walla,WA,99362,USA
LETSS,Let's Stop N Shop,Jaime Yorres,Owner,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA";
File.WriteAllText("cust.csv", csvString);

// Read into an array of strings.
string[] source = File.ReadAllLines("cust.csv");
XElement cust = new XElement("Root",
    from str in source
    let fields = str.Split(',')
    select new XElement("Customer",
        new XAttribute("CustomerID", fields[0]),
        new XElement("CompanyName", fields[1]),
        new XElement("ContactName", fields[2]),
        new XElement("ContactTitle", fields[3]),
        new XElement("Phone", fields[4]),
        new XElement("FullAddress",
            new XElement("Address", fields[5]),
            new XElement("City", fields[6]),
            new XElement("Region", fields[7]),
            new XElement("PostalCode", fields[8]),
            new XElement("Country", fields[9])
        )
    )
);
Console.WriteLine(cust);

This code produces the following output:

Xml
    <Root>
      <Customer CustomerID="GREAL">
        <CompanyName>Great Lakes Food Market</CompanyName>
        <ContactName>Howard Snyder</ContactName>
        <ContactTitle>Marketing Manager</ContactTitle>
        <Phone>(503) 555-7555</Phone>
        <FullAddress>
          <Address>2732 Baker Blvd.</Address>
          <City>Eugene</City>
          <Region>OR</Region>
          <PostalCode>97403</PostalCode>
          <Country>USA</Country>
        </FullAddress>
      </Customer>
      <Customer CustomerID="HUNGC">
        <CompanyName>Hungry Coyote Import Store</CompanyName>
        <ContactName>Yoshi Latimer</ContactName>
        <ContactTitle>Sales Representative</ContactTitle>
        <Phone>(503) 555-6874</Phone>
        <FullAddress>
          <Address>City Center Plaza 516 Main St.</Address>
          <City>Elgin</City>
          <Region>OR</Region>
          <PostalCode>97827</PostalCode>
          <Country>USA</Country>
        </FullAddress>
      </Customer>
      <Customer CustomerID="LAZYK">
        <CompanyName>Lazy K Kountry Store</CompanyName>
        <ContactName>John Steel</ContactName>
        <ContactTitle>Marketing Manager</ContactTitle>
        <Phone>(509) 555-7969</Phone>
        <FullAddress>
          <Address>12 Orchestra Terrace</Address>
          <City>Walla Walla</City>
          <Region>WA</Region>
          <PostalCode>99362</PostalCode>
          <Country>USA</Country>
        </FullAddress>
      </Customer>
      <Customer CustomerID="LETSS">
        <CompanyName>Let's Stop N Shop</CompanyName>
        <ContactName>Jaime Yorres</ContactName>
        <ContactTitle>Owner</ContactTitle>
        <Phone>(415) 555-5938</Phone>
        <FullAddress>
          <Address>87 Polk St. Suite 5</Address>
          <City>San Francisco</City>
          <Region>CA</Region>
          <PostalCode>94117</PostalCode>
          <Country>USA</Country>
        </FullAddress>
      </Customer>
    </Root>

Edit I didn't see the first problem before. Do some pre-processing on your CSV first, replacing the column separator.

Use this:

    var filePath = "Your csv file path here including name";
    var newFilePath = filePath + ".tmp";

    using (StreamReader vReader = new StreamReader(filePath))
    {
        using (StreamWriter vWriter = new StreamWriter(newFilePath, false, Encoding.ASCII))
        {
            int vLineNumber = 0;
            while (!vReader.EndOfStream)
            {
                string vLine = vReader.ReadLine();
                vWriter.WriteLine(ReplaceLine(vLine, vLineNumber++));
            }
        }
    }

    File.Delete(filePath);
    File.Move(newFilePath, filePath);

    Dts.TaskResult = (int)ScriptResults.Success;
}

protected string ReplaceLine(string Line, int LineNumber)
{
    var newLine = Line.Replace("\",\"", "|");
    newLine = newLine.Replace(",\"", "|");
    newLine = newLine.Replace("\",", "|");
    return newLine;
}
Hangarter
  • 582
  • 4
  • 12
  • it doesnt cover my case i.e if theres a comma inside quotes then split quotes not comma – Vinodh Dec 16 '14 at 16:19
  • You could parse your CSV file first and replace the correct commas by something else. Google cloud uses a thorn character but you could use pipe (|) per example. I would do a string replacement on the file and after that process it. – Hangarter Dec 16 '14 at 16:21
  • It seems as well that you have your strings surrounded by quotes. You can easily replace '",', '","' and ',"' by a pipe, then you use this pipe as column delimiter. – Hangarter Dec 16 '14 at 16:24
  • But that would just be extremely inefficient as you would then have to parse the file twice instead of once. Also it is common in CSV files to escape quotes as well by doubling them up so simply replacing '",' might still not give the correct result. – Martin Brown Dec 16 '14 at 16:43
-1

I've had the exact same issue with CSV files generated by Excel. The thing is (and this is good) that if the field content contains the separator, the content is quoted as in your example (if the content contains the quote character, too, it is doubled).

I also did not use a ready made parser, but implemented it as follows:

    private string[] ParseLine(string line, char fieldSeparator, char? textSeparator)
    {
        List<string> items = new List<string>();

        StringBuilder itemBuilder = new StringBuilder();
        bool textSeparatorFound = false;

        for (int i = 0; i < line.Length; i++)
        {
            // Get current character
            char currentChar = line[i];

            // In case it is a field separator...
            if (currentChar == fieldSeparator)
            {
                // a) Did we recognize a quote before => Add the character to the item
                if (textSeparatorFound)
                {
                    itemBuilder.Append(currentChar);
                }

                // b) We're not within an open quote => We've finished a field
                else
                {
                    string item = itemBuilder.ToString();
                    itemBuilder.Remove(0, itemBuilder.Length);

                    // Replace doubled text separators
                    if (textSeparator != null)
                    {
                        string replaceWhat = String.Concat(textSeparator, textSeparator);
                        string replaceWith = textSeparator.ToString();
                        item = item.Replace(replaceWhat, replaceWith);
                    }

                    items.Add(item);
                }
            }

            // If it is a quote character
            else if (currentChar == textSeparator)
            {
                // a) If we have no open quotation, we open one
                if (!textSeparatorFound)
                {
                    textSeparatorFound = true;
                }

                // b) If we have an open quotation we have to decide whether to close it or not
                else
                {
                    // If this character is followed by the field separator or the end of the string, 
                    // this ends a quoted block. Otherwise we just add it to the output to
                    // handle quoted quotes.
                    if (i < line.Length - 1 && line[i + 1] != fieldSeparator)
                        itemBuilder.Append(currentChar);
                    else
                        textSeparatorFound = false;
                }
            }

            // All other characters are appended to the current item
            else
                itemBuilder.Append(currentChar);
        }

        // All other text is just appended
        if (itemBuilder.Length > 0)
        {
            string item = itemBuilder.ToString();
            itemBuilder.Remove(0, itemBuilder.Length);

            // Remember to replace quoted quotes
            if (textSeparator != null)
            {
                string replaceWhat = String.Concat(textSeparator, textSeparator);
                string replaceWith = textSeparator.ToString();
                item = item.Replace(replaceWhat, replaceWith);
            }

            items.Add(item.Trim());
        }

        return items.ToArray();
    }
Thorsten Dittmar
  • 55,956
  • 8
  • 91
  • 139
-2

The problem with CSV is that it is an irregular language. That means that characters have different meanings depending upon what came before or after them in the character stream. As you have seen splitting using the string.Split method does not correctly identify commas in fields escaped with quotes.

While it is possible to do a crude parse of a CSV line using regular expressions and look back and look forward techniques these are often buggy and slow. This is because Regular Expressions were designed for regular languages. A better approach is to simply parse the characters using a simple function like this one:

using System;
using System.Collections.Generic;
using System.Text;

    class Program
    {
        static void Main(string[] args)
        {
            IList<string> fields = ParseCSVLine("text,\"text with quote(\"\") and comma (,)\",text");

            foreach (string field in fields)
            {
                Console.WriteLine(field);
            }
        }

        public static IList<string> ParseCSVLine(string csvLine)
        {
            List<string> result = new List<string>();
            StringBuilder buffer = new StringBuilder(csvLine.Length);

            bool inQuotes = false;
            char lastChar = '\0';

            foreach (char c in csvLine)
            {
                switch (c)
                {
                    case '"':
                        if (inQuotes)
                        {
                            inQuotes = false;
                        }
                        else
                        {
                            // This next if handles the case where 
                            // we have a doubled up quote
                            if (lastChar == '"')
                            {
                                buffer.Append('"');
                            }
                            inQuotes = true;
                        }
                        break;

                    case ',':
                        if (inQuotes)
                        {
                            buffer.Append(',');
                        }
                        else
                        {
                            result.Add(buffer.ToString());
                            buffer.Clear();
                        }
                        break;

                    default:
                        buffer.Append(c);
                        break;
                }

                lastChar = c;
            }
            result.Add(buffer.ToString());

            return result;
        }
    }

The above outputs:

text
text with quote(") and comma (,)
text
Martin Brown
  • 24,692
  • 14
  • 77
  • 122
  • 1
    Your solution fails to handle the case when the field value contains the `"` character literal. In this case it is doubled (at least by convention). – Thorsten Dittmar Dec 16 '14 at 16:41
  • It does, that is why it looks at the lastChar and compares it to a " before appending it and switching back to inQuotes mode. – Martin Brown Dec 16 '14 at 16:47
  • I see - you do a look behind instead of a look ahead. Got confused. Sorry to interrupt ;-) – Thorsten Dittmar Dec 16 '14 at 18:47
  • This is actually a slight modification of a piece of code that parses off of a stream. When using streams rather than strings it is easier to do a look behind than a look ahead. – Martin Brown Dec 17 '14 at 10:32