5

There are times when we want to override a stream's key properties (primary key) or it's incremental replication key. What's the best way to do this in Meltano?

aaronsteers
  • 2,277
  • 2
  • 21
  • 38

2 Answers2

6

In your Meltano yaml file, you can add a table-key-properties and/or replication-key override using the metadata extra config.

Docs reference: https://docs.meltano.com/concepts/plugins#metadata-extra

Example yaml code:

extractors:
- name: tap-postgres
  metadata:
    some_stream_id:
      table-key-properties: [id]
      replication-key: created_at
      replication-method: INCREMENTAL

This example sets the primary key to id and the replication key to created_at.

Update regarding wildcards (2022-03-30)

Per @visch's comment, wildcards can also be used in stream names to match multiple streams at once. Such as in these example:

extractors:
- name: tap-postgres
  metadata:
    "*":
      # set all streams to "full table" mode
      replication-method: FULL_TABLE

And:

extractors:
- name: tap-postgres
  metadata:
    "*":
      # ignore primary keys for all streams
      table-key-properties: []
aaronsteers
  • 2,277
  • 2
  • 21
  • 38
  • 3
    For testing and just starting with singer or a new tap I like to use metadata: '*': replication-method: FULL_TABLE As that way I can just see the data and figure out the replication stuff later. Realized the comment is hard to read, https://gitlab.com/vischous/oracle2mssql/-/blob/master/oracle2mssql/meltano.yml#L46-48 is much easier – visch Mar 30 '22 at 11:22
  • 2
    Thanks, @visch! I've updated my answer using this example. – aaronsteers Mar 30 '22 at 20:19
  • @aaronsteers can you please correct your answer? Apparently, it was renamed from `key-properties` to `table-key-properties`. At least in version 2.7.0 – Andrey Tatarinov Oct 03 '22 at 12:57
1

After having followed @visch suggestion by using

  metadata:
    "*":
      replication-method: FULL_TABLE

from the logs (when running Meltano in debug mode with --log-level=debug) in discovery mode you should see something like:

metadata node for tap_stream_id 'mydb-mytable'

From the above you can get the correct collection name and then structure the meltano.yml accordingly:

extractors:
- name: tap-postgres

  select:
  - mydb-mytable.*

  metadata:
    mydb-mytable:
      table-key-properties: [id]
      replication-method: INCREMENTAL
      replication-key: created_at

In particular it was very important for me to add the select field as otherwise I was getting this error message:

ERROR message=Invalid replication method None! 
replication method needs to be either FULL_TABLE or INCREMENTAL 
Vzzarr
  • 4,600
  • 2
  • 43
  • 80