0

I have 4 DataTables that I am trying to join, but can't figure out how to efficiently do so.

I got the first two tables to join, creating a 3rd object of apptDetails which is an IEnumerable of DataRows. I am having trouble getting it back to a DataTable so I can do more joins on it though. I am getting an error on apptDetails.CopyToDataTable() of: 'IEnumerable' does not contain a definition for 'CopyToDataTable' and no accessible extension method 'CopyToDataTable' accepting a first argument of type 'IEnumerable' could be found (are you missing a using directive or an assembly reference?)

DataTable customer = ETL.ParseTable("customer");
DataTable appointments = ETL.ParseTable("appointments");
IEnumerable apptDetails = from t1 in customer.AsEnumerable()
    join t2 in appointments.AsEnumerable() on Convert.ToInt32(t1["customerId"]) equals Convert.ToInt32(t2["customerId"])
    into tableGroup
     select new
       {
        customerId = t1["customerId"],
        TotalAppointments = tableGroup.Count(),
        appointment_missed = Convert.ToInt32(t1["MissedAppt"]),
        appointment_show_rate = (
                                    tableGroup.Count()>0 ? 
                                        Math.Round((1 - ((double)Convert.ToInt32(t1["MissedAppt"]) / (double)tableGroup.Count())),2)
                                        : 0
                                )

        };
DataTable dt = apptDetails.CopyToDataTable();

I had originally just used var apptDetails, but it looked like I needed more typecasting, so I tried things like the following:

 IEnumerable<DataRow> apptDetails
 IEnumerable<EnumerableRowCollection> apptDetails
 as well as:
 DataTable dt = apptDetails.CopyToDataTable<DataRow>();
 DataTable dt = apptDetails.CopyToDataTable<EnumerableRowCollection>();

I need to join the customer and appointments tables, and then tack on the new columns as well into a single flat table. What am I missing about how I am doing it, or is there a better way of doing it?

Performance is a factor as we are talking 20,000 customers and 80,000 appointments, plus there will be 2-3 more tables to join after this so I would like to learn the "Right" way of doing this using Linq to it's fullest.

Alan
  • 2,046
  • 2
  • 20
  • 43

2 Answers2

1

You should work more on separation of concern: separate your internal storage method for your data (DataTables) from your data handling (combine the data in your datatables using LINQ statements).

In your case, consider creating extension functions for DataTable: functions that convert a DataTable into IEnumerable<Customer> and IEnumerable<Appointment>, and functions that convert IEnumerable<Customer> / IEnumerableback into aDataTable`.

If you do that, it will be easier to recognize patterns and re-use code. Furthermore, if you change your data storage, for instance from DataTable to a CSV-file, or a database, or whatever, all you have to do is to write a function to make it IEnumerable / IQueryable, and your LINQ queries will still work.

See Extension methods demystified

static class DataTableExtensions
{
     public static IEnumerable<Customer> ToCustomers(this DataTable table)
     {
          ... // TODO: implement
     }
     public static DataTable ToDataTable(this IEnumerable<Customer> customers)
     {
          ... // TODO implement
     }

     // similar functions for Appointments and AppointmentDetails:
     public static IEnumerable<Appointment> ToAppointments(this DataTable table) {...}
     public static DataTable ToDataTable(this IEnumerable<Appointment> appointments) {...}
     public static IEnumerable<AppointmentDetails> ToAppointmentDetails(this DataTable table) {...}
     public static DataTable ToDataTable(this IEnumerable<AppointmentDetail> appointmentDetails) {...}

You know DataTables better than I do, so I'll leave the code to you. For help, see Convert DataTable to IEnumerable and Convert IEnumerable to DataTable

We need to write a function for your LINQ query. You can keep it as a bunch of LINQ statements, however, it will look neater, better readable, better testable, better re-usable if you write a function for this (after all: you know how to write extension functions by now:

public static IEnumerable<AppointmentDetail> ToAppointmentDetails(
    this IEnumerable<Customer> customers,
    IEnumerable<Appointment> appointments)
{
    return customers.GroupJoin(appointments,     // GroupJoin customer and appointments
        customer => customer.CustomerId,         // from every customer take the customerId,
        appointment => appointment.CustomerId,   // from every appointment take the CustomerId,
        // from every Customer with all his matching Appointments make one new AppointmentDetail 
        (customer, appointments => new AppointmentDetail 
        {
            CustomerId = customer.CustomerId,
            TotalAppointments = appointments.Count(),
            MissedAppointments = appointments
                 .Where(appointment => appointment.IsMissed)
                 .ToList(),
            ...
        });
}

Now put everything together:

Usage:

DataTable customerTable = ...
DataTable appointmentTable = ...
IEnumerable<Customer> customers = customerTable.ToCustomers();
IEnumerable<Appointment> appointments = appoitnmentTable.ToAppointments();

IEnumerable<AppointmentDetail> appointmentDetails = customers.ToAppointmentDetails(appointments);

DataTable appointmentDetailTables = appointmentDetails.ToDataTable(appointmentDetails);

Now doesn't this look much neater?

Note that only the last statement will actually do any enumerating. All earlier statements only create an IEnumerable, no enumeration is done. This is very similar as concatenating LINQ statements. In fact, if you really want, and you can convince your project leader that the code will be better readable, testable, maintainable (which I doubt), you can rewrite it in one statement, similar to concatenating LINQ statements. Don't think that this will improve processing speed:

DataTable appointmentDetailTable = customerTable.ToCustomers()
    .ToAppointmentDetails(appointmentTable.ToAppointments())
    .ToDataTable();

Because you separated your concernts, this code is much more re-usable. Slight changes won't affect your code very much If you decide that your Customers and Appointments are to be fetched from a database instead of from a DataTable, all you have to do is rewrite your ToCustomers and ToAppointments, all other functions will remain unchanged.

Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116
0

Not sure why this works when other methods didn't, but using this:

DataTable apptDetails = (from t1 in customer.AsEnumerable()
join t2 in appointments.AsEnumerable() on Convert.ToInt32(t1["customerId"]) equals Convert.ToInt32(t2["customerId"])
into tableGroup
 select new
   {
    customerId = t1["customerId"],
    TotalAppointments = tableGroup.Count(),
    appointment_missed = Convert.ToInt32(t1["MissedAppt"]),
    appointment_show_rate = (
                                tableGroup.Count()>0 ? 
                                    Math.Round((1 - ((double)Convert.ToInt32(t1["MissedAppt"]) / (double)tableGroup.Count())),2)
                                    : 0
                            )
    }).CopyToDataTable();

works after implementing the info from this page: https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/implement-copytodatatable-where-type-not-a-datarow

I still couldn't do the:

DataTable dt = apptDetails.CopyToDataTable();

but it works the other way.

Alan
  • 2,046
  • 2
  • 20
  • 43