I want to load many tables which is in aws rds mysql server by using cloud data fusion. each table storage is more than about 1gb. also I found the plugin which name is "multiple database table" to load multi table. but i got a fail. Also basically when I used database source I can check my tables' schema. However, in multiple database table, i can 't find how to check table's schema. how can i use this plugin? or is there any other way to load many tables in data fusion service?
-
Could you rephrase your question? If I understand correctly, you want to load your AWS RDS tables to GCP BigQuery. What error did you get? Do you have any network restrictions? Do you allow a GCP <> AWS connection (are you using VPN)? Could you provide the exact steps you have followed? – PjoterS Jul 14 '21 at 10:44
-
`@PjoterS` thank you for your comment. exactly I want to migrate many tables in my RDS mysql to GCP storage or bigquery. I found a way to migrate one table in one pipeline, but I don't know how to migrate multiple tables in one pipeline. However, I understand that multiple database table is used to implement this. However, I don't know how to use it. in summary, I want to know how to use multiple database table source to migrate many tables to google cloud storage or bigquery – user16436399 Jul 14 '21 at 14:47
-
[This](https://stackoverflow.com/questions/64368503/gcp-data-fusion-multiple-table-import?rq=1) is the same problem I'm having. – user16436399 Jul 14 '21 at 14:55
-
I was wondering if you exactly need to use a pipeline or you just need to migrate your DB from AWS to GCP. For example [here](https://hevodata.com/learn/rds-to-bigquery/) you have example how to do it using `.CSV` However if you need pipeline, did you see `Muhammad Izzuddin` tutorial: Building a Simple Batch Data Pipeline from AWS RDS to Google BigQuery — [Part 1: Setting UP AWS Data pipeline](https://medium.com/thelorry-product-tech-data/building-a-simple-batch-data-pipeline-from-aws-rds-to-google-bigquery-part-1-setting-up-aws-b7787ffb6805) – PjoterS Jul 19 '21 at 16:17
-
and [Part 2: Setting up BigQuery Transfer Service and Scheduled Query](https://medium.com/thelorry-product-tech-data/building-a-simple-batch-data-pipeline-from-aws-rds-to-google-bigquery-part-2-setting-up-6d2bbca75448). From what I understand this should solve your issue. If not, could you please elaborate what exactly scenario you have? – PjoterS Jul 19 '21 at 16:17
1 Answers
I'm posting this Community Wiki
as OP didn't provide enough details to reproduce but the below information might help someone.
There are few ways to get your data using Cloud Data Fusion, you can use pipeline
, plugin
, driver
and a few others depending on your needs.
On the internet you can find two very well described guides with examples.
If you would like to find some information about Cloud Data Fusion
with GCP products you should read Bahadir Bulut
guide - How I used Google Cloud Data Fusion to create a data warehouse - Part 1 and Part 2. Also Data Fusion
allows to use 150+ preconfigured connectors and transformations like Amazons S3, SQS, etc. Azure services and many more.
Another well described (which I guess would help OP) is to configure both Amazon
and GCP
resources and using pipelines
. This guide is Building a Simple Batch Data Pipeline from AWS RDS to Google BigQuery — Part 1: Setting UP AWS Data pipeline and second part Building a Simple Batch Data Pipeline from AWS RDS to Google BigQuery — Part 2: Setting up BigQuery Transfer Service and Scheduled Query.. In short this guide describes 2 main steps:
- Extract data from
MYSQL RDS
and bring intoS3
usingAWS data pipeline
service - From
S3
, bring the file insideBigquery
usingBigqQuery transfer service
.

- 12,841
- 1
- 22
- 54