3

From this thread, I got the basic info on how to parse CSV to create XML. Unfortunately, the text fields (all enclosed in quotes) sometimes contain commas, so line.split(',') gives me too many columns. I can't figure out how to parse the CSV so line.split(',') distinguishes between commas within a text field, and commas separating fields. Any thoughts on how to do that?

Thanks!

Community
  • 1
  • 1
EoRaptor013
  • 1,735
  • 4
  • 18
  • 31

6 Answers6

2

Go grab this code: http://geekswithblogs.net/mwatson/archive/2004/09/04/10658.aspx

Then replace line.Split(",") with SplitCSV(line), like:

var lines = File.ReadAllLines(@"C:\text.csv");

var xml = new XElement("TopElement",
   lines.Select(line => new XElement("Item",
      SplitCSV(line)
          .Select((column, index) => new XElement("Column" + index, column)))));

xml.Save(@"C:\xmlout.xml");

Note that the code at the link above is rather old, and probably could be cleaned up a bit using Linq, but it should do the trick.

Grinn
  • 5,370
  • 38
  • 51
  • Har! I'd vote you up twice if I could! And kudos to matt watson for the regex. I think it must take a special convolution in the brain to figure out regex -- I can't ever seem to get my head around it. – EoRaptor013 Dec 01 '10 at 14:08
  • Hi Grinn, I am using your code in my project for converting csv to xml. Is there any option that the xml data can be saved into database without saving to a file. I appreciate if you could provide some code. I am looking for something like xml.savetoDatabase("tablename"); – Henry Sep 13 '12 at 13:54
  • @Henry there isn't anything like `xml.saveToDatabase("tablename") but you could get the XML as a string using `xml.toString()`, then save that to a field in a table. – Grinn Oct 02 '12 at 19:24
1

Try FileHelpers.

The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.

Pavel Chuchuva
  • 22,633
  • 10
  • 99
  • 115
0

What about using the pipe character "|"? This often happens with CSV files and a better approach is to seperate on pipes.

Burt
  • 7,680
  • 18
  • 71
  • 127
  • Pipes or tilde would be better, but I have to deal with what the client sent. Also, the issue is that commas are embedded in some fields. I can't figure out how to replace the commas separating fields without hitting the commas within the fields. – EoRaptor013 Nov 30 '10 at 19:20
  • You can't is there no way you can request a change to a different character? – Burt Dec 01 '10 at 12:09
  • Are the columns a fixed size/charcter length? – Burt Dec 01 '10 at 12:24
0

If your CSV files are too complex to make writing your own parser practical, use another parser. The Office ACE OLEDB provider may already be available on your system, but may be overkill for your purposes. I haven't used any of the lightweight alternatives, so I can't speak to their suitability.

Simon
  • 1,197
  • 1
  • 6
  • 14
  • I think I may be forced into writing my own parser. I'm pretty sure a Regex guru could figure something out, but that sure ain't me. I'll end up having to iterate every comma to determine whether it's a delimiter or embedded. Can you spell UGLY? – EoRaptor013 Nov 30 '10 at 19:25
0

While I'm almost always against regular expression, here's a solution using it.

Assume you have data as such:

"first name","last name","phone number"
"john,jane","doe","555-5555"

Then, the following code:

string csv = GetCSV();  // will load your CSV, or the above data
foreach (string  line in csv.Split('\n'))
{
    Console.WriteLine("--- Begin record ---");
    foreach (Match m in Regex.Matches(line, "\".+?\""))            
        Console.WriteLine(m.Value);            
}

will output this:

--- Begin record ---
"first name"
"last name"
"phone number"
 --- Begin record ---
"john,jane"
"doe"
"555-5555"

But I would not recommend the Regex approach if you have like a 2 GB csv file.
So you can use that as your baseline for making up your XML records.

BeemerGuy
  • 8,139
  • 2
  • 35
  • 46
  • I'm not sure what you're going for here. Splitting a string that does NOT contain commas within fields is trivial. It's the blasted embedded commas that are messing me up. – EoRaptor013 Nov 30 '10 at 19:21
  • @EoRaptor013 -- this will work even if you include commas in the field values. I edited the post to include a comma in the example. – BeemerGuy Nov 30 '10 at 19:24
  • Ah! I see what you are doing here now. Unfortunately, not all fields are enclosed in quotes; specifically, numeric fields are not enclosed. But, thank you for pointing this out. – EoRaptor013 Nov 30 '10 at 19:55
  • @EoRaptor013 -- the only thing you're left with is to parse it character by character and determine opening and closing quotes. – BeemerGuy Nov 30 '10 at 20:14
0

Here's a little trick if you don't want to use Regex. Instead of spliting with comma you can split with comma and quotes together ","

Assuming there's no space before and after comma:

line.Split("\",\"")

You will need to remove the quote before the first field and after the last field however.

  • Unfortunately, the records contain numeric fields that are not enclosed in quotes. So, just going for commas surrounded by quotes won't delimit the numeric fields. – EoRaptor013 Nov 30 '10 at 19:32