1

Trying to convert xlsx file to csv via C# code. I am able to convert file successfully but it could not handle newline and line break properly. Please advice which separator to use so that my csv is generated correctly

csv conversion code could not convert correctly. It should complete the conversion of a column completely but didn't.

using the below code to convert xlsx file to csv

string file= @"fakepath\abc.xlsx";

DataSet result = new DataSet();

//------To read the xlsx file
if (file.EndsWith(".xlsx"))
            {
                // Reading from a binary Excel file (format; *.xlsx)
                FileStream stream = File.Open(file, FileMode.Open, FileAccess.Read);
                IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
                result = excelReader.AsDataSet();
                excelReader.Close();
            }

//-------To convert the file into csv format
while (row_no < result.Tables[0].Rows.Count)
            {
                for (int i = 0; i < result.Tables[0].Columns.Count; i++)
                {
                    a += result.Tables[0].Rows[row_no][i].ToString() + ",";
                }
                row_no++;
                a += "\r\n";
            }

string output = @"fakepath\abc.csv";
            StreamWriter csv = new StreamWriter(@output, false);
            csv.Write(a);
            csv.Close();

column in excel holds data as

  1. First Line

  2. Second Line

  3. Third Line

excel point i.e. 1,2 and 3 are in same column but different lines

Expected output csv formmat 1. First Line 2. Second Line 3. Third Line

Please help

2 Answers2

0

if what you want is

         +-------+-------+-----+-----+-----+
         |  a    |       |     |     |     |
         |  b    |       |     |     |     |
         |  c    |       |     |     |     |
         +---------------------------------+
         |  d    |       |     |     |     |
         |  e    |       |     |     |     |
         |  f    |       |     |     |     |
         +---------------------------------+
         |  g    |       |     |     |     |
1        |  h    |       |     |     |     |
         |       |       |     |     |     |
         +---------------------------------+
         |       |       |     |     |     |


         +-------+------+------+-----+-----+
         |  a    |   b  |   c  |     |     |
         |       |      |      |     |     |
         +---------------------------------+
         |  d    |  e   |  f   |     |     |
         |       |      |      |     |     |
         +---------------------------------+
         |  g    |  h   |      |     |     |
 2       |       |      |      |     |     |
         +---------------------------------+
         |       |      |      |     |     |
         |       |      |      |     |     |


         +-------+------+------+-----+-----+
         |  abc  |      |      |     |     |
         |       |      |      |     |     |
         +---------------------------------+
         |  def  |      |      |     |     |
 3       |       |      |      |     |     |
         +---------------------------------+
         |  gg   |      |      |     |     |
         |       |      |      |     |     |
         +---------------------------------+
         |       |      |      |     |     |

for 1 to 2 USE

var column = result.Tables[0].Rows[row_no][i].ToString();
column.Split(new[] { '\r', '\n' }).ToList()
.ForEach(column_line => { a += column_line + ","; });

for 1 to 3 USE

var column = result.Tables[0].Rows[row_no][i].ToString();
a += String.Join(" ", column.Split(new[] { '\r', '\n' }));

also for 1 to 3 you can USE

var column = result.Tables[0].Rows[row_no][i].ToString();
column = column.Replace("\r", " ");
a+= column.Replace("\n", " ");

Full code

string file = @"fakepath\abc.xlsx";

DataSet result = new DataSet();

//------To read the xlsx file
if (file.EndsWith(".xlsx"))
{
    // Reading from a binary Excel file (format; *.xlsx)
    FileStream stream = File.Open(file, FileMode.Open, FileAccess.Read);
    IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
    result = excelReader.AsDataSet();
    excelReader.Close();
}

//-------To convert the file into csv format
while (row_no < result.Tables[0].Rows.Count)
{
    for (int i = 0; i < result.Tables[0].Columns.Count; i++)
    {
        var column = result.Tables[0].Rows[row_no][i].ToString();
        //column.Split(new[] { '\r', '\n' }).ToList().ForEach(column_line => { a += column_line + ","; });
        or
        a+=String.Join(" ", column.Split(new[] { '\r', '\n' }));
        //or
        //column = column.Replace("\r", " ");
        //a+= column.Replace("\n", " ");

    }
    row_no++;
    a += Environment.NewLine;
    //or
    // a += "\r\n";
}

string output = @"fakepath\abc.csv";
StreamWriter csv = new StreamWriter(@output, false);
csv.Write(a);
csv.Close();

More

also

a +=Environment.NewLine

instead of

 a +="\r\n";
Mohamed Elrashid
  • 8,125
  • 6
  • 31
  • 46
  • Current Output 1. First Line,2. Second Line,3. Third Line Expected output 1. First Line 2. Second Line 3. Third Line In excel all these three points are in 1 column, if all values will be separated with comma then if I import it in SQL then I will need 3 columns but it is 1 column value – Hrishikesh Bagchi Jan 05 '19 at 06:36
  • do you still need help ?\ – Mohamed Elrashid Jan 07 '19 at 22:44
0

The main issue here is the most common issue people have when processing CSV files: they think all that is required is to concatenate the values with commas, and put a newline at the end of each line. There are fringe cases they don't think about like:

  1. What if a value contains newline?
  2. What if a value contains comma?
  3. Once they've overcome 1 and 2, what if the value contains "?

I'd suggest the best approach would be to use a library to help create the CSV file. But if you don't want to do that, here are some suggestions which may improve things:

  1. Each value needs to be enclosed in " characters.
  2. As a result of point 1, " characters within the value need to be escaped with a backslash character before them.
  3. The declaration of a is not shown, but I assume it to be a string. Strings are immutable, so you should use System.Text.StringBuilder to construct the string.
  4. The code in the question will always end up with an extra comma on the end of each line. Consider using string.Join to put commas between the values, so you don't get one on the end.
  5. Streams are IDisposable, so each one should be in a using block. Once you've done that, you don't need to Close the stream, as it will be closed by the implicit Dispose when it exits the block.
  6. Rather than while (row_no < result.Tables[0].Rows.Count) prefer foreach (DataRow dataRow in result.Tables[0].Rows). It's easier to read what the loop is doing, and puts the row in a local variable, rather than having to do the property lookup for Tables and the indexer lookup for the table, and the property lookup for Rows and the indexer lookup each time within the loop. And it makes the contents of the loop easier to read.
  7. Use System.Environment.NewLine in preference to "\r\n". This is just cosmetic, again, to make it human-readable.

So that would give you...

string file= @"fakepath\abc.xlsx";

DataSet result = new DataSet();

//------To read the xlsx file
if (file.EndsWith(".xlsx"))
{
    // Reading from a binary Excel file (format; *.xlsx)
    using (FileStream stream = File.Open(file, FileMode.Open, FileAccess.Read))
    {
        IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
        result = excelReader.AsDataSet();
        excelReader.Close();
    }
}

//-------To convert the file into csv format
var a = new StringBuilder();
int columnCount=result.Tables[0].Columns.Count;
foreach (DataRow dataRow in result.Tables[0].Rows)
{
    var values=new List<string>(columnCount);
    for (int i = 0; i < columnCount; i++)
    {
        values.Add(dataRow[i].ToString());
    }
    string line=string.Join(",", values.Select(v=>$"\"{Escape(v)}\"");
    a.AppendLine(line); // even neater than Environment.NewLine
}

string output = @"fakepath\abc.csv";
using(StreamWriter csv = new StreamWriter(@output, false))
{
    csv.Write(a.ToString());
}

I'm not at my PC, so cut me some slack if there are typos or autocorrect errors in the above. Also: I have omitted the definition of method string Escape(string v), which needs to do what is described in my point 2.

Richardissimo
  • 5,596
  • 2
  • 18
  • 36