I am transforming an Excel spreadsheet into a list of "Elements" (this is a domain term). During this transformation, I need to skip the header rows and throw out malformed rows that cannot be transformed.
Now comes the fun part. I need to capture those malformed records so that I can report on them. I constructed a crazy LINQ statement (below). These are extension methods hiding the messy LINQ operations on the types from the OpenXml library.
var elements = sheet
.Rows() <-- BEGIN sheet data transform
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows() <-- END sheet data transform
.ToElements(strings) <-- BEGIN domain transform
.RemoveBadRecords(out discard)
.OrderByCompositeKey();
The interesting part starts at ToElements
, where I transform the row lookup to my domain object list (details: it's called an ElementRow
, which is later transformed into an Element
). Bad records are created with just a key (the Excel row index) and are uniquely identifiable vs. a real element.
public static IEnumerable<ElementRow> ToElements(this IEnumerable<KeyValuePair<UInt32Value, Cell[]>> map)
{
return map.Select(pair =>
{
try
{
return ElementRow.FromCells(pair.Key, pair.Value);
}
catch (Exception)
{
return ElementRow.BadRecord(pair.Key);
}
});
}
Then, I want to remove those bad records (it's easier to collect all of them before filtering). That method is RemoveBadRecords
, which started like this...
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements)
{
return elements.Where(el => el.FormatId != 0);
}
However, I need to report the discarded elements! And I don't want to muddy my transform extension method with reporting. So, I went to the out parameter (taking into account the difficulties of using an out param in an anonymous block)
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements, out List<ElementRow> discard)
{
var temp = new List<ElementRow>();
var filtered = elements.Where(el =>
{
if (el.FormatId == 0) temp.Add(el);
return el.FormatId != 0;
});
discard = temp;
return filtered;
}
And, lo! I thought I was hardcore and would have this working in one shot...
var discard = new List<ElementRow>();
var elements = data
/* snipped long LINQ statement */
.RemoveBadRecords(out discard)
/* snipped long LINQ statement */
discard.ForEach(el => failures.Add(el));
foreach(var el in elements)
{
/* do more work, maybe add more failures */
}
return new Result(elements, failures);
But, nothing was in my discard
list at the time I looped through it! I stepped through the code and realized that I successfully created a fully-streaming LINQ statement.
- The temp list was created
- The
Where
filter was assigned (but not yet run) - And the discard list was assigned
- Then the streaming thing was returned
When discard
was iterated, it contained no elements, because the elements weren't iterated over yet.
Is there a way to fix this problem using the thing I constructed? Do I have to force an iteration of the data before or during the bad record filter? Is there another construction that I've missed?
Some Commentary
Jon mentioned that the assignment /was/ happening. I simply wasn't waiting for it. If I check the contents of discard
after the iteration of elements
, it is, in fact, full! So, I don't actually have an assignment problem. Unless I take Jon's advice on what's good/bad to have in a LINQ statement.