When using singer taps and targets, we sometimes see the target creating columns for columns that are explicitly filtered out (deselected) in the tap.
Why does this this happen and how can we resolve the issue?
When using singer taps and targets, we sometimes see the target creating columns for columns that are explicitly filtered out (deselected) in the tap.
Why does this this happen and how can we resolve the issue?
By way of background and spec introduction, Singer taps and target communicate with each other by way of SCHEMA
and RECORD
message types.
SCHEMA
messages are sent from the tap to the target first, and they tell the target what kind of tables need to be created. They allow the target to prepare the destination platform (if necessary) for the data which will arrive.
RECORD
messages arrive after the SCHEMA
message, and they contain the actual data.
This symptom (columns being created even when the corresponding fields are deselected) occurs when SCHEMA
messages are not filtered and are just passed, raw, from the source's data catalog. Ideally SCHEMA
records should be filtered based on the same selection logic that RECORD
messages are filtered on, but this is not always the case.
Then, because the SCHEMA
messages arrive before the RECORD
messages, the target will go ahead and create a destination column for all fields, even those which are not going to have data when RECORD
messages arrive.
The most direct fix is for the tap developer to add filtering logic into SCHEMA
messages, just as then have for RECORD
messages. Most tap maintainers will accept an Issue or Pull Request on this topic. If the tap is built on Meltano's SDK, then SCHEMA
messages will automatically be filtered, along with RECORD
messages - so another option is to port to the SDK or for the user to migrate to a variant of the tap that is already using the SDK.
Full disclosure: I work for Meltano and I work on Meltano's SDK for Singer Taps and Targets (https://sdk.meltano.com). I am also the author of several taps and targets.