1

I'm ultimately trying to read a file, store in a blob in a Cassandra database, then later write it back to disk. Was having issues, and simplified to the following code that demonstrates the issue. The same code works fine with .txt and .csv files, but doesn't work with an Excel spreadsheet. I'm not sure if it's an encoding issue, and how to fix it. I tried Encoding.Unicode, but that didn't work either.

static void testFileReadWrite()
{
    string filenameIn = @"c:\demo\Timesheet.xls";
    string filenameOut1 = @"c:\demo\Timesheet_Test1.xls";
    string filenameOut2 = @"c:\demo\Timesheet_Test2.xls";

    string fileContents1 = File.ReadAllText(filenameIn);
    File.WriteAllText(filenameOut1, fileContents1);

    string fileContents2 = File.ReadAllText(filenameIn, Encoding.Unicode);
    File.WriteAllText(filenameOut2, fileContents2, Encoding.Unicode);
}

Original file size was 536,576. Test1 was 1,282,808 and Test2 was 536,578. Obviously, I want the same size file out as I read in, then I will do a file compare on it to validate it. I also want the program to work with any file type, it's just that Excel was the first one that demonstrated the issue.

NealWalters
  • 17,197
  • 42
  • 141
  • 251
  • 2
    An Excel file is not a text file and you can't read it as such. – Equalsk Apr 12 '17 at 14:46
  • 2
    What are you trying to do? Excel files are binary (zipped XML actually). Why are you trying to treat it as *text*? – Panagiotis Kanavos Apr 12 '17 at 14:47
  • 1
    BTW the code is wrong for text files too. You are *changing* the files without realizing it. `ReadAllText` converts ASCII text to Unicode. `WriteAllText` writes out UTF8. If you want to read and write the exact same byts, use. `ReadAllBytes` – Panagiotis Kanavos Apr 12 '17 at 14:48
  • My bad, thought all bytes were "text" and could be represented by a string. Need to re-read definition of text then. I know a string isn't exactly an array of bytes, but isn't almost the same? – NealWalters Apr 12 '17 at 14:56
  • Also, does the program have to know if the file is unicode or not? Or will use Read/write AllBytes handle any type of file? – NealWalters Apr 12 '17 at 14:57
  • You can represent bytes as text characters, but it would not be representative of the real object at all, even a string. You'd only need to know the encoding if you were trying to convert the byte array back to its string representation or something, for the straight storing/retrieving of files this is not necessary. – Equalsk Apr 12 '17 at 15:07

2 Answers2

3

Reading and writing binary files should be done using File.ReadAllBytes and File.WriteAllBytes instead. Hope it helps!

This StackOverflow helps explain the difference between ReadAllBytes and ReadAllText: Why is File.ReadAllBytes result different than when using File.ReadAllText?

Community
  • 1
  • 1
Eugene Komisarenko
  • 1,533
  • 11
  • 25
-1

I would the 'OfficeOpenXml' library from Microsoft and read the excel file as you like.

//
using OfficeOpenXml;

//
read the file using file stream
using (var stream= new StreamReader("filePath"))
{
     // then use the stream to extract the excel specific data
     using (var excelPackage = new ExcelPackage(stream))
    {
        var sheet = excelPackage.Workbook.Worksheets[1];
        var cell1= sheet.Cells[1, 1].GetValue<string>();
    }
}

Then you can build up your string and store to anywhere you like.

Jegan
  • 1,227
  • 9
  • 16
  • Read the question, it said: "also want the program to work with any file type, it's just that Excel was the first one that demonstrated the issue.": – NealWalters Apr 12 '17 at 15:00
  • My bad, just the streamReader and the StreamWriter will do a good job if you like regardless of file type. – Jegan Apr 12 '17 at 15:03