0

I set up a small data stack for my company and it includes Airbyte to extract and load data from external services to a BigQuery datawarehouse, and it works well, we have now a massive amount of raw data and started to explore it with our dataviz tool.

To improve maintainability of our discoveries, we set up a dbt project to store our queries and help team to improve them.

However we aimed a limitation because while airbyte supports dbt transformation for a single source there is now way (afaik) to configure it when it comes to joining tables from multiple sources since I've no way to ensure that the source B is synced.

What are the best practices for such use cases? Is there any tools that I can plug to this stack to improve data transformation with dbt?

Trent
  • 5,785
  • 6
  • 32
  • 43
  • If you have your data loaded into BigQuery through Airbyte, you should then be able to easily join data from multiple sources (e.g. through dbt, which works on top of BQ in the end). Why do you think there's no way to do so? – Aleix CC Aug 29 '23 at 07:49
  • @AleixCC the thing is when you set up a dbt transformation in airbyte, the transformation (dbt is ran) once the data had been loaded for the source, so if I need to cross data from another source I cannot be sure that the other source had been sync. But I don't how to ensure that both sources had been sync **before** to run the transformation. – Trent Aug 29 '23 at 08:01
  • Oh I see! Yes, in that case you will need an orchestrator as Noel mentioned below. A tool that, for example, will let you run airbyte syncs and, once completed, run the relevant dbt transformations – Aleix CC Aug 29 '23 at 17:04

1 Answers1

1

You need an orchestrator that will trigger airbyte then dbt. Check out services like Astronomer or Datacoves.

noel_g
  • 279
  • 1
  • 6
  • 17