1

So I'm currently using the following snippet in a C# WPF application to convert some XML data to CSV.

string text = File.ReadAllText(file);
text = "<Root>" + text + "</Root>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(text);
StreamWriter write = new StreamWriter(FILENAME1);
XmlNodeList rows = doc.GetElementsByTagName("XML");

foreach (XmlNode row in rows)
{
    List<string> children = new List<string>();

    foreach (XmlNode child in row.ChildNodes)
    {
        children.Add(child.InnerText.Trim());
    }

    write.WriteLine(string.Join(",", children.ToArray()));
}

However I've run into a situation. My input XML data looks something like the following (Sorry, you have to scroll horizontally to see how the data actually looks like in raw format):

<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
    tham ALL out. For some reason 
    that is not the case
    please press the on button 
    when trying to activate
    device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 

Now, the problem I'm encountering is that .. my output looks like this (given below); Since, it is a CSV file, I want the output to be in one single row, So how would I go about removing the line breaks from the raw data so the output is in a single horizontal line? I'm lost as to how I would approach this situation. Would Replace(System.Environment.NewLine, "") work? Any help will be appreciated!

1.0,770162,20121009133435,3,,20121009133435,721,5,1,0,0,0,00:00,00:00,,00032134826064957,4627,1,,1872161156,7,0,10000,1,0,5000000,0,10000000,0,1 ,,Keep it simple or spell
    tham ALL out. For some reason 
    that is not the case
    please press the on button 
    when trying to activate
    device codes also available on
list,,,20121009133435,00-1d-71-0a-71-80,-66,,,0,50 

EDIT:

Also note that my input file has several thousand lines like shown below:

<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 
<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 

.. goes on 
sparta93
  • 3,684
  • 5
  • 32
  • 63
  • 1
    _"Would `Replace(System.Environment.NewLine, "")` work?"_ - how about you try it? :) – CodeCaster Jun 10 '15 at 15:38
  • @CodeCaster I did, It didn't work. Maybe I didn't use it correctly. – sparta93 Jun 10 '15 at 15:40
  • Maybe show that code then. – CodeCaster Jun 10 '15 at 15:40
  • If @CodeCaster 's solution didn't work how about: .Replace("\r\n", "").Replace("\n", "").Replace("\r", ""); – bill Jun 10 '15 at 15:42
  • 1
    Please [stop trying to parse CSV](http://www.secretgeek.net/csv_trouble) using `String.Join` and `string.Split`. That **does not work**. There are [many](http://stackoverflow.com/questions/2081418/) [many](http://stackoverflow.com/questions/9642055/) working, tested CSV parsers that will do this correctly. – Dour High Arch Jun 10 '15 at 15:43
  • @CodeCaster chilren = children.Replace(System.Environment.NewLine, ""); , This is what I essentially wrote before writing to the CSV file, but it didn't work because I guess I cannot use Replace on a list. – sparta93 Jun 10 '15 at 15:44
  • 1
    @DourHighArch - OP is not trying to parse a CSV file, he's trying to create one. – Tim Jun 10 '15 at 15:45
  • 1
    @sparta93 you can't do replace on a list... do the replace on the actual string item itself... child.InnerText.Trim().Replace(Environment.NewLine,"") – bill Jun 10 '15 at 15:45
  • 1
    The articles I linked to apply equally well to creating CSV files. `String.Join(",", data)` does not work when `data` contains commas. – Dour High Arch Jun 10 '15 at 15:49
  • @bill tried your approach, still same output. Check the edit – sparta93 Jun 10 '15 at 15:56
  • Pause your code on "children.Add(child.InnerText.Trim());" and show us the raw string in inspector for child.InnerText – bill Jun 10 '15 at 15:59

1 Answers1

1

Try

children.Add(Regex.Replace(child.InnerText, "\\s+", " "));

This shouldn't depend on any specific newline character and will also get rid of the four spaces in between every line. \s is the regex for any whitespace and + means one or more occurrences.

sirdank
  • 3,351
  • 3
  • 25
  • 58
  • children.Add(child.InnerText.Trim().Replace("\r\n", "")); I tried this approach, but still the same output. Check the edit. – sparta93 Jun 10 '15 at 15:57
  • How about children.Add(child.InnerText.Trim('\r','\n')); This should remove singular \r, \n, and \r\n – LocEngineer Jun 10 '15 at 15:59
  • We need a sample string from the OP of what the raw data looks like from inspector to answer his question completely – bill Jun 10 '15 at 16:01
  • Yes, that works. Can you explain to me what the "\\s+" does? – sparta93 Jun 10 '15 at 16:07
  • The only complaint I would have with this solution, is that Regex.Replace has terrible performance compared to string.Replace so if you're using this on a very large file, you will notice longer run times. http://blogs.msdn.com/b/debuggingtoolbox/archive/2008/04/02/comparing-regex-replace-string-replace-and-stringbuilder-replace-which-has-better-performance.aspx – bill Jun 10 '15 at 16:21
  • @bill You are right and I don't anticipate it being better in any circumstances. However, it may be a little less bad as it is replacing several calls to `String.Replace()` (one each for `\r`, `\n`, etc). – sirdank Jun 10 '15 at 16:27