2

I have got an excel work book with multiple sheets having same data schema. I have working implementation to load data from single sheet.

Is there a way to merge similar records (schema) into a single set (rows) using JoinOperation or any such operation?

My understanding is JoinOperation can be used for left, right, outer and inner joins but not for union since the return type of MergeRows is Row.

Thanks in advance.

Sandeep G B
  • 3,957
  • 4
  • 26
  • 43

1 Answers1

3

You could implement AbstractOperation to combine multiple input operations like this:

public class UnionAllOperation : AbstractOperation     {
    private readonly List<IOperation> _operations = new List<IOperation>(); 

    public override IEnumerable<Row> Execute(IEnumerable<Row> rows)
    {
        foreach (var operation in _operations)
            foreach (var row in operation.Execute(null))
                yield return row;
    }

    public UnionAllOperation Add(IOperation operation) {
        _operations.Add(operation);
        return this;
    }
}

Update: Refer to parallel version on here.

Use it in a process like this:

public class Process : EtlProcess {
    protected override void Initialize() {

        Register(
            new UnionAllOperation()
                .Add(new ExtractFromExcel("WorkBook1.xls"))
                .Add(new ExtractFromExcel("WorkBook2.xls"))
        );
    }
}

This performs a union all operation. If you need a union that returns distinct rows, implement an AbstractAggregationOperation, and group on all columns.

dalenewman
  • 1,234
  • 14
  • 18