1

Is it possible to add the data from multiple table of selective columns of mysql rds database to single table with another mysql rds instance using AWS Glue.

Please suggest.

Thanks

Somnath Rokade
  • 655
  • 1
  • 9
  • 27
  • Are these multiple tables have a common column to perform join and if not you are looking to union the data? Because union requires all the datasets to have same type and number of columns – Prabhakar Reddy Jan 07 '20 at 04:52
  • Yes correct. Multiple tables have have a common column to perform join – Somnath Rokade Jan 07 '20 at 05:10

1 Answers1

1

Yes it is possible to achieve this with Glue via two approaches:

First approach:

  1. Run a Glue crawler on all these tables and load all these tables in to your Glue job from Glue catalog.

  2. Once you have loaded them in to multiple Glue DynamicFrames then you can select the columns along with the join key and join them.

  3. Then join these DynamicFrames and write the combined result back into MySql RDS table.

In this approach you will be loading all the columns from multiple tables and then selecting required columns inside your Glue job and join them.

Second Approach:

  1. You can frame a SQL query to select and join all these multiple tables and push it down to MySQL engine.

  2. The result is then calculated at MySQL engine and you will load this result into Spark DatFrame.

  3. Final step will be to convert this DataFrame to DynamicFrame and writing it to MySQL table.

In this approach you are delegating the computing task to MySQL and there will be impact on database engine if your tables are too big.

Prabhakar Reddy
  • 4,628
  • 18
  • 36