0

Quick overview: The main goal is to read in the data from a set date, from the row and get the ref number form the set date e.g. Start date.

For example if i just wanted the data from the date set to the 1st of last month and upwards.

I currently have to extract some data from the excel spreadsheet example below:

Start date  Ref number
29/07/2015  2342326
01/07/2016  5697455
02/08/2016  3453787
02/08/2016  5345355
02/08/2015  8364456
03/08/2016  1479789
04/07/2015  9334578

enter image description here

Output using EPPlus:

29/07/2015
2342326
29/07/2016
5697455
02/08/2016
3453787
02/08/2016
5345355
02/08/2015
8364456
03/08/2016
1479789
04/07/2015
9334578

This part is fine, but when i try to strip the output via a date range i'm getting errors, e.g. using LINQ i get the following error output.

An unhandled exception of type 'System.InvalidCastException' occurred in System.Data.DataSetExtensions.dll

Additional information: Specified cast is not valid.

LINQ code:

var rowsOfInterest = tbl.AsEnumerable()
.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
.ToList();

I've also tried to modify the from date range using the datatable:

DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");

But get the following error:

An unhandled exception of type 'System.Data.EvaluateException' occurred in System.Data.dll

Additional information: Cannot perform '>=' operation on System.String and System.Double.

The last attempt was to try and see if i can strip the date from within the loop.

Code used:

DateTime dDate;
row[cell.Start.Column - 1] = cell.Text;
string dt = cell.Text.ToString();

if (DateTime.TryParse(dt, out dDate))
{
    DateTime dts = Convert.ToDateTime(dt);
}

DateTime date1 = new DateTime(2016, 7, 1);

if (dDate >= date1)
{
    Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
}

This sort of works but just lists the set dates and not there values, which is understandable, If i take this route how would i get the dates with there values?

Output:

29/07/2016
02/08/2016
02/08/2016
03/08/2016

The full code example used:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.OleDb;
using System.Text.RegularExpressions;
using Microsoft.Office.Interop.Excel;
using System.Data;
using System.IO;

namespace Number_Cleaner
{
    public class NumbersReport
    {

        //ToDo: Look in to fixing the code so it filters the date correctly with the right output data.
        public System.Data.DataTable GetDataTableFromExcel(string path, bool hasHeader = true)
        {
            using (var pck = new OfficeOpenXml.ExcelPackage())
            {
                using (var stream = File.OpenRead(path))
                {
                    pck.Load(stream);
                }
                var ws = pck.Workbook.Worksheets.First();
                System.Data.DataTable tbl = new System.Data.DataTable();
                foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
                {
                    tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
                }
                var startRow = hasHeader ? 2 : 1;
                for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
                {
                    var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
                    DataRow row = tbl.Rows.Add();
                    foreach (var cell in wsRow)
                    {

                        DateTime dDate;
                        row[cell.Start.Column - 1] = cell.Text;
                        string dt = cell.Text.ToString();
                        //Console.WriteLine(dt);

                        if (DateTime.TryParse(dt, out dDate))
                        {
                            DateTime dts = Convert.ToDateTime(dt);
                        }

                        DateTime date1 = new DateTime(2016, 7, 1);

                        if (dDate >= date1)
                        {
                            Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
                        }

                        //Console.WriteLine(row[cell.Start.Column - 1] = cell.Text);
                    }
                }
                //var rowsOfInterest = tbl.AsEnumerable()
                 // .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
                 //.ToList();
                //Console.WriteLine(tbl);
                //DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#");

                return tbl;
            }
       }

Modified from: How to match date to row then get the final column value using EPPlus?

Community
  • 1
  • 1
Mattlinux1
  • 777
  • 2
  • 13
  • 25

1 Answers1

1

Based on your code, you are storing everything in your DataTable as strings by calling cell.Text. But using that you are loosing valuable information - the cell data type. You are much better off using cell.Value which will either be a string or a double. With Excel, dates, integers, and decimal values are all stored as doubles.

The error you are seeing has to do with the fact that you store the values as string but query them like DateTime here:

.Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))

and here:

"'Start date' >= #1/7/2016#"

If you look at my post here: How to parse excel rows back to types using EPPlus you will see the helper function ConvertSheetToObjects which deals pretty much with what you are trying to do. With a little modification we can turn it into something that takes a WorkSheet and converts it to a DataTable. Like the Object converstion method you should still provide it the expected structure in the form as a DataTable passed it rather then having it try to guess it by casting cell values:

public static void ConvertSheetToDataTable(this ExcelWorksheet worksheet, ref DataTable dataTable)
{
    //DateTime Conversion
    var convertDateTime = new Func<double, DateTime>(excelDate =>
    {
        if (excelDate < 1)
            throw new ArgumentException("Excel dates cannot be smaller than 0.");

        var dateOfReference = new DateTime(1900, 1, 1);

        if (excelDate > 60d)
            excelDate = excelDate - 2;
        else
            excelDate = excelDate - 1;
        return dateOfReference.AddDays(excelDate);
    });

    //Get the names in the destination TABLE
    var tblcolnames = dataTable
        .Columns
        .Cast<DataColumn>()
        .Select(dcol => new {Name = dcol.ColumnName, Type = dcol.DataType})
        .ToList();

    //Cells only contains references to cells with actual data
    var cellGroups = worksheet.Cells
        .GroupBy(cell => cell.Start.Row)
        .ToList();

    //Assume first row has the column names and get the names of the columns in the sheet that have a match in the table
    var colnames = cellGroups
        .First()
        .Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx })
        .Where(o => tblcolnames.Select(tcol => tcol.Name).Contains(o.Name))
        .ToList();


    //Add the rows - skip the first cell row
    for (var i = 1; i < cellGroups.Count(); i++)
    {
        var cellrow = cellGroups[i].ToList();
        var tblrow = dataTable.NewRow();
        dataTable.Rows.Add(tblrow);

        colnames.ForEach(colname =>
        {
            //Excel stores either strings or doubles
            var cell = cellrow[colname.index];
            var val = cell.Value;
            var celltype = val.GetType();
            var coltype = tblcolnames.First(tcol => tcol.Name ==  colname.Name).Type;

            //If it is numeric it is a double since that is how excel stores all numbers
            if (celltype == typeof(double))
            {
                //Unbox it
                var unboxedVal = (double)val;

                //FAR FROM A COMPLETE LIST!!!
                if (coltype == typeof (int))
                    tblrow[colname.Name] = (int) unboxedVal;
                else if (coltype == typeof (double))
                    tblrow[colname.Name] = unboxedVal;
                else
                    throw new NotImplementedException($"Type '{coltype}' not implemented yet!");
            }
            else if (coltype == typeof (DateTime))
            {
                //Its a date time
                tblrow[colname.Name] = val;
            }
            else if (coltype == typeof (string))
            {
                //Its a string
                tblrow[colname.Name] = val;
            }
            else
            {
                throw new DataException($"Cell '{cell.Address}' contains data of type {celltype} but should be of type {coltype}!");
            }
        });

    }

}

To use it on something like this:

enter image description here

You would run this:

[TestMethod]
public void Sheet_To_Table_Test()
{
    //https://stackoverflow.com/questions/38915006/stripping-data-from-a-epplus-output-from-a-date-range

    //Create a test file
    var fi = new FileInfo(@"c:\temp\Sheet_To_Table.xlsx");

    using (var package = new ExcelPackage(fi))
    {
        var workbook = package.Workbook;
        var worksheet = workbook.Worksheets.First();

        var datatable = new DataTable();
        datatable.Columns.Add("Col1", typeof(int));
        datatable.Columns.Add("Col2", typeof(string));
        datatable.Columns.Add("Col3", typeof(double));
        datatable.Columns.Add("Col4", typeof(DateTime));

        worksheet.ConvertSheetToDataTable(ref datatable);

        foreach (DataRow row in datatable.Rows)
            Console.WriteLine(
                $"row: {{Col1({row["Col1"].GetType()}): {row["Col1"]}" +
                $", Col2({row["Col2"].GetType()}): {row["Col2"]}" +
                $", Col3({row["Col3"].GetType()}): {row["Col3"]}" +
                $", Col4({row["Col4"].GetType()}):{row["Col4"]}}}");

        //To Answer OP's questions
        datatable
            .Select("Col4 >= #01/03/2016#")
            .Select(row => row["Col1"])
            .ToList()
            .ForEach(num => Console.WriteLine($"{{{num}}}"));
    }
}

Which gives this in the output:

row: {Col1(System.Int32): 12345, Col2(System.String): sf, Col3(System.Double): 456.549, Col4(System.DateTime):1/1/2016 12:00:00 AM}
row: {Col1(System.Int32): 456, Col2(System.String): asg, Col3(System.Double): 165.55, Col4(System.DateTime):1/2/2016 12:00:00 AM}
row: {Col1(System.Int32): 8, Col2(System.String): we, Col3(System.Double): 148.5, Col4(System.DateTime):1/3/2016 12:00:00 AM}
row: {Col1(System.Int32): 978, Col2(System.String): wer, Col3(System.Double): 668.456, Col4(System.DateTime):1/4/2016 12:00:00 AM}
{8}
{978}
Community
  • 1
  • 1
Ernie S
  • 13,902
  • 4
  • 52
  • 79
  • Sorry for the late reply. the information given is very informative, I do have one problem though and that is one the " worksheet.ConvertSheetToDataTable(ref datatable);" line i get the following error: – Mattlinux1 Aug 15 '16 at 08:25
  • Error 25 'OfficeOpenXml.ExcelWorksheet' does not contain a definition for 'ConvertSheetToDataTable' and no extension method 'ConvertSheetToDataTable' accepting a first argument of type 'OfficeOpenXml.ExcelWorksheet' could be found (are you missing a using directive or an assembly reference? – Mattlinux1 Aug 15 '16 at 08:25
  • Changed the line to ConvertSheetToDataTable(worksheet, ref datatable); – Mattlinux1 Aug 15 '16 at 09:06
  • I have a second error message: An unhandled exception of type 'System.NullReferenceException' occurred on line 71. Looks like its reading from the cell below? and not stopping on the populated line. I've tried breaking the loop if null but will not work? – Mattlinux1 Aug 15 '16 at 09:56
  • The code with the problem: //Add the rows - skip the first cell row for (var i = 1; i < cellGroups.Count(); i++) { var cellrow = cellGroups[i].ToList(); var tblrow = dataTable.NewRow(); dataTable.Rows.Add(tblrow); colnames.ForEach(colname => { //Excel stores either strings or doubles var cell = cellrow[colname.index]; var val = cell.Value; – Mattlinux1 Aug 15 '16 at 09:57
  • 1
    @Mattlinux1 The first error has to do with it be an extension method but what you did will work just as well. The second one is harder to diagnose. If you think it is reading beyond where it should you can modify the `cellGroup` creation with something that checks the data more carefully. Post a screenshot of your excel file if you cant get anywhere. – Ernie S Aug 15 '16 at 11:58
  • It's pretty odd, why it's going to A9 when i run the code. it should stop on the populated cells? this is why i'm getting the null value. I've added a screenshot of the file. – Mattlinux1 Aug 15 '16 at 12:20
  • When i do a console.writeline(); on cell and vel. i get the output of: A2 42214.0472800926 A3 42580.0472800926 A4 42584.0472800926 A5 42584.0472800926 A6 42218.0472800926 A7 42585.0472800926 A8 42189.0472800926 A9 – Mattlinux1 Aug 15 '16 at 12:36
  • So it's defiantly is hitting A9 and because it's null it will throw an expatiation. All i need to know/do is how to i exit a List loop, break will not work? e.g. i could do an: if (val == null){ exit loop; } etc? – Mattlinux1 Aug 15 '16 at 12:38
  • 1
    @Mattlinux1 Take a look at that cell, I wonder if it has an empty string in it? Try right-clicking and deleteing the entire row in excel. – Ernie S Aug 15 '16 at 13:14
  • 1
    @Mattlinux1 I updated my code above to do a better job of looking for something like that. – Ernie S Aug 15 '16 at 13:23
  • Seem's too have worked, i added the updated code also re deleted the A9 line in excel and it now reads in the values. The only thing that is not working is the date not being striped down to the first of the month e.g. using the var dateOfReference = new DateTime(2016, 7, 1); – Mattlinux1 Aug 15 '16 at 13:38
  • e.g. removing any ref numbers, older than the 1st of the previous month. – Mattlinux1 Aug 15 '16 at 13:53
  • 1
    @Mattlinux1 Thats should be straight forward, just use a combo of datatable and linq selectors. See my edit above. – Ernie S Aug 15 '16 at 13:55
  • Could not convert to string so had to do a work around. But inst working. also $ is invalid. – Mattlinux1 Aug 15 '16 at 14:08
  • Code i'm working with:foreach (DataRow row in datatable.Rows) { DateTime dDate; string dt = row["start date"].ToString(); if (DateTime.TryParse(dt, out dDate)) { DateTime dts = Convert.ToDateTime(dt); } – Mattlinux1 Aug 15 '16 at 14:09
  • DateTime date1 = new DateTime(2016, 7, 1); datatable .Select(dDate >= date1) .Select(row => row["Ref number"]) .ToList() .ForEach(num => Console.WriteLine(num)); } } – Mattlinux1 Aug 15 '16 at 14:09
  • 1
    Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/120969/discussion-between-ernie-and-mattlinux1). – Ernie S Aug 15 '16 at 14:18