Discard arbitrary header lines in BizTalk Flat File processing?

Question

I found these related threads, but they do not answer my question directly:

BizTalk - Flat file with Header multiple records and Footer - Disassemble problem

Removing header from a flat file in BizTalk

I'm dealing with an old system that delivers flat files with a very loose schema. In particular, the header consists of two lines: the first line is a title and the second line is column headers. All subsequent lines are valid records.

The problem is that when there are no records for that day, the column headers are omitted; in that case, we have the document title, and then a summary line (for human consumption) that notifies the reader that there are no records for that day.

Because the same file can have such different formats, I'm having a hard time creating a header schema that I can use in my flat file receive pipeline that will allow me to strip the header information off. Furthermore, since the header is multiple lines, it appears that I can't just use a carriage return delimiter.

I have tried two approaches to this:

A header schema that contains two carriage-return-delimited field elements, each of which are opaque strings
A header schema that contains two carriage-return-delimited records, each of which defines a dummy infix delimiter that will never exist in either line (resulting in one opaque string per record)

When I deploy these, however, BizTalk will pick up the files and process them, however it does not actually result in any messages. This leads me to believe that BizTalk is treating my flat files as though the entire file is the header, such that it finds no records.

The solution I'm trying to find is how to create a header schema that causes BizTalk to treat the first two lines of a file as the header, regardless of their contents, and discard them. Is this possible?

EDIT: Examples of the different files:

Records exist:

2017-02-27 19:27:03
CustomerName, OrderNumber, Expedite, ItemNumber, Count
CustomerA, O196801, 0, I232, 2
CustomerA, O196801, 0, I255, 1
CustomerB, O196802, 0, I237, 1
CustomerC, O196803, 0, I214, 1
CustomerC, O196803, 0, I232, 2

No records in this file:

2017-02-27 19:30:22
***EOF***

The first line is always the same, and can be described with a positionally delimited record.
The second line is either a comma-delimited list of column names, or this EOF line.
The EOF appears only when there are no records.

Currently I'm able to process files with records only by defining the delimiter between header and document schemas to be the entire column header line, i.e. CustomerName, OrderNumber, Expedite, ItemNumber, Count{CR}{LF} However, this header schema fails for the empty file when it finds ***EOF*** instead of the column header string.

Can you give us some sample messages and a sample schema where you have tried to solve the problem? Without those it is rather hard to help — Dijkgraaf, Feb 25 '17 at 01:32
Can you treat the first column name as a Tag? If so, your problem really doesn't sound that complicated. — Johns-305, Feb 27 '17 at 14:10
As Johns has said. Create a record that has a Tag of CustomerName, and have that record have a MinOccurs of 0. A repeating record with Min Occurs 0 for your data, and also another record with a tag of the EOF and a MinOccurs 0f 0. — Dijkgraaf, Feb 28 '17 at 20:19
@Dijkgraaf I really liked this idea! However, I'm not sure it will work--when I attempt it, BizTalk errors out claiming "Unable to match the data in the input stream" and the message it suspends is "***EOF***" by itself. My header schema has the Date line, MinOccurs 1, the ColumnHeaders line, Tag "CustomerName" MinOccurs 0, and the eof line Tag "***EOF" MinOccurs 0. The document schema contains the repeating record. I tried this with a variety of settings for Infix/Postfix Child Order on the header schema, as well as "Preserve Delimiter for Empty Data" and "Suppress Trailing Delimiters." — bwerks, Mar 02 '17 at 21:33
Also I should add that I have Header and Document schemas broken up so that the flat file disassembler will debatch my repeating records, by setting MaxOccurs 1 on the repeating record element in the document schema. — bwerks, Mar 02 '17 at 22:48

Dan Field · Answer 1 · 2017-02-28T14:36:23.400

There might be some clever way to handle this with the Flat File schema, but I can't think of it.

I would probably look to write a custom Decode component for the pipeline that would inspect the first few bytes of the message for that ***EOF*** - if so, just null out the stream (or perhaps rewrite it with the expected headers) - if not, reset the position of the stream back to 0 and pass it along.

e.g. (note: untested, probably works code):

public IBaseMessage Execute(IPipelineContext pContext, IBaseMessage pInMsg)
{
    if (pInMsg == null || pInMsg.BodyPart == null) return pInMsg;

    var stream = pInMsg.BodyPart.GetOriginalDataStream();
    if (stream == null || stream.Length == 0) return pInMsg;

    if (!stream.CanSeek)
    {
        stream = new ReadOnlySeekableStream(stream);
        pContext.ResourceTracker.AddResource(stream);
    }

    StreamReader reader = new StreamReader(stream);
    pContext.ResourceTracker.AddResource(reader);
    reader.ReadLine(); // date line
    if (reader.ReadLine() == "***EOF***")
    {
        pInMsg.BodyPart.Data = null;
    }
    else 
    {
        stream.Position = 0;
        pInMsg.BodyPart.Data = stream;
    }
    return pInMsg;
}

Discard arbitrary header lines in BizTalk Flat File processing?

1 Answers1