2

I have recently been tasked with writing a piece of software that will import Excel files.

The problem I am trying to solve is that my company has c100 clients and each supply a file in a different layout, in so much as the columns in a file will differ between clients but the pertinent information is there in each file.

This process is complicated due to the fact that certain operations need to be done to different files.

In 1 file, for example, a column needs to be inserted after a specifc column and then the result of a calculation needs to be placed into that column. In that same sheet an address is supplied across 9 columns, this address needs to be moved into the last 6 of the 9 columns and then have the first 3 columns removed.

What I don't want to do is write the processing logic for each file (c 100 as mentioned) and thereby get trapped into the drudge of having to maintain this code and be responsible for adding new customer files as they come in.

What I want to do is create a Rule or Processing engine of sorts whereby I can have basic rules like "Insert Column", "Remove Column", "Insert Calculation", "Format a, b, c, d, e & f Columns To Use d, e & f" - the reason being so that configuring the read and process of any new file can be done through a front-end piece of software by an end user (obviously with some training on what to do).

Is there a pattern or strategy that might fit this? I have read about Rules engines but the best examples of these are simple boolean comparisons like "Age = 15" or "Surname = 'Smith'" but can't find a decent example of doing something like "Insert Column after Column G" then "Put G - 125 in to Column H".

Any help here, or a pointer to a good approach, would be greatly appreciated.

Blorgbeard
  • 101,031
  • 48
  • 228
  • 272
  • If you can't identify any patterns that can be repeated across the files, then you are better off doing the formatting manually. Is there any way to provide a uniform template to all customers? Are you trying to import the data from Excel? Not sure why the C# tag is listed. It is pretty simple to write code to insert columns and modify formulas in a spreadsheet, but it gets much more difficult (and pointless) if you need a completely dynamic set of modifications. Maybe I am not understanding exactly what you are trying to do. – Ric Gaudet Aug 22 '17 at 21:51
  • I don't know if this is the answer that you're looking for, or what constraints you have, but what we would normally try to do is standardise the input, by providing a standard template for the customers to use. – ainwood Aug 22 '17 at 22:22

3 Answers3

1

Let me see if I can help you out here.

Correct me if I am wrong, but it seems like all your input and output files contain data in columns and columns only.

In that case, you should imagine your problem as a transformation of X input columns to Y output columns. For each client, you will need a configuration that will specify the transform. The configuration might look like below

Y1 = X1
Y2 = X1 + X2 
Y3 = X3 + " some string"

As you can see, your configuration lines are simply C# expressions. You can use the LINQ Expression class to build an expression tree from your transformation formulas. You can learn about Expressions here. These expressions can then be compiled and used to do the actual transform. If you think in terms of C#, you will build a static transform method that takes a list as input and returns a list as output for each client. When you use Expressions, you will have to parse the configuration files yourself.

You can also use the Roslyn Compiler Services, which can support proper C# syntax. This way, you can literally have a static method which can do the transform. This also relieves you of the parsing duties.

In either case, you will still have to deal with things like: should I expect the columns to be a string (which means your support needs to know explicitly instruct the configuration GUI to parse needed columns into numbers) or should I automatically convert number like fields into numbers (now support doesn't have to do extra configuration, but they might hit issues when dealing with columns which have numbers, like ID, but should be treated as a string to avoid any improper handling), etc.

In Summary, my approach is:

  • Create config file per client.
  • Convert the config file into C# method dynamically using Expressions or Roslyn
  • Provide a GUI for generating this config - this way the support person can easily specify the transform without knowing your special syntax (Expressions) or C# syntax (Roslyn). When saving config, you can generate one method per client in a single assembly (or separate assembly per client) and persist it. Let's call it client library.
  • Your main application can do all the standard stuff of reading from excel, validating, etc and then call the client library method to generate the output in a standard format, which can be further processed in your main application.

Hope you got the gist.

Edit: Adding some code to demonstrate. The code is a bit long-winded, but commented for understanding.

// this data represents your excel data
var data = new string[][] {
    new string [] { "col_1_1", "10", "09:30" },
    new string [] { "col_2_1", "12", "09:40" }
};

// you should read this from your client specific config file/section
// Remember: you should provide a GUI tool to build this config
var config = @"
            output.Add(input[0]);

            int hours = int.Parse(input[1]);
            DateTime date = DateTime.Parse(input[2]);
            date = date.AddHours(hours);
            output.Add(""Custom Text: "" + date);
";

// this template code should be picked up from a 
// non client specific config file/section
var code = @"
using System;
using System.Collections.Generic;
using System.Linq;

namespace ClientLibrary {
    static class ClientLibrary {
        public static List<string> Client1(string[] input) {
            var output = new List<string>();

            <<code-from-config>>

            return output;
        }
    }
}
";

// Inject client configuration into template to form full code
code = code.Replace(@"<<code-from-config>>", config);

// Compile your dynamic method and get a reference to it
var references = new MetadataReference[] {
    MetadataReference.CreateFromFile(typeof(object).Assembly.Location),
    MetadataReference.CreateFromFile(typeof(Enumerable).Assembly.Location)
};

CSharpCompilation compilation = CSharpCompilation.Create(
    null,
    syntaxTrees: new[] { CSharpSyntaxTree.ParseText(code) },
    references: references,
    options: new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));


MethodInfo clientMethod = null;
using (var ms = new MemoryStream()) {
    EmitResult result = compilation.Emit(ms);

    if (!result.Success) {
        foreach (Diagnostic diagnostic in result.Diagnostics) {
            Console.Error.WriteLine("{0}: {1}", diagnostic.Id, diagnostic.GetMessage());
        }
    } else {
        ms.Seek(0, SeekOrigin.Begin);
        Assembly assembly = Assembly.Load(ms.ToArray());
        clientMethod = assembly.GetType("ClientLibrary.ClientLibrary").GetMethod("Client1");
    }
}

if (clientMethod == null)
    return;

// Do transformation
foreach (string[] row in data) {
    var output = clientMethod.Invoke(null, new object[] { row }) as List<string>;
    Console.WriteLine(string.Join("|", output));
}

You will need some nuget libraries to compile this, and their corresponding using clauses

nuget install Microsoft.Net.Compilers   # Install C# and VB compilers
nuget install Microsoft.CodeAnalysis    # Install Language APIs and Services

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Reflection;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.Emit;

As you notice, the only piece to worry about is the GUI to auto-generate the code for the transformation - which I have not provided here. If you want simple transforms, that should be very easy, but for a complex transform, it will be more involved

Vikhram
  • 4,294
  • 1
  • 20
  • 32
0

It sounds like you're expecting your end user to be technical-savvy enough to understand this configuration mechanism that you're going to write. If they can handle that level of technical detail, it might be simpler to give them an Excel book and an official excel template that contains all the columns that your import app needs and they can manually massage the data to the spec.

Otherwise, I would suggest some strategy design based pattern solution to build a library of "data massager" classes for known formats, and just add new classes as new formats are encountered. e.g.

public interface IClientDataImporter
{ 
    List<MyCustomRowStructure> Import(string filename); 
}

// client 1 importer
public class ClientOneImporter : IClientDataImporter
{
    public List<MyCustomRowStructure> Import(string filename)
    {
       var result = new List<MyCustomRowStructure>();
       // ..... insert custom logic here
       return result;
    }
}

// client 2 importer
public class ClientTwoImporter : IClientDataImporter
{
    public List<MyCustomRowStructure> Import(string filename)
    {
       var result = new List<MyCustomRowStructure>();
       // ..... insert custom logic here
       return result;
    }
}

// repeat up to however many formats you need

// then.....

public class ExcelToDatabaseImporter
{
    public void ImportExcelFile(string filename, string clientName)
    {
         var myValidData = GetClientDataImporter(clientName).Import(filename);
         StickMyDataToMyDatabase(myValidData); // this is where you would load the structure into the db... won't need to touch every time a new format is encountered
    }
    public IClientDataImporter GetClientDataImporter(string clientName)
    {
         switch (clientName):
            case "ClientOne":
                return new ClientOneImporter(); 
                break;
            case "ClientTwo":
                return new ClientTwoImporter(); 
                break;
            default:
                throw new ArgumentException("No importer for client");
                break;

    }
}
Goose
  • 546
  • 3
  • 7
-2

I would suggest you to maintain an xml configuration file for each excel file. The xml configuration has to read by a tool, may be a console application, and generate new CSV file, based on the xml configuration.

As XML configuration file can be easily edited by any text editor, users can update the same.

Venkataraman R
  • 12,181
  • 2
  • 31
  • 58