bulk import to AWS MySQL from s3 or DynamicDataframe

Question

Now we try to change data exporter on Rubby to AWS GLUE. I have IBM database as a source and AWS MySQL as a target My need is to import tables from three databases to the target. I've made:

the bronze layer with replicas of three tables with the same structure,
the silver layer with all data manipulations (union, column renaming and drop duplicates). The result table is partitioned by a categorial column and takes 2 GiB as parquet-file with snappy or 6 GiB as CSV-file.

All imports to the bronze layer was bulked by 'hashfield' in connection_options parametres.

Now I need to import it to the existing AWS MySQL table. Ideally - import just new strings. But all-table-migration is Ok also.

The native code like the one below takes too much time (it was killed by timeout after 120 minutes):

                                                    
# Update target table
logger.info(f"===> Start the target table updating.")
MySQL_node = (
    glueContext
    .write_dynamic_frame
    .from_catalog( frame=dfc,
                              database="sandbox_db",
                              table_name=f"sandbox_db_{target_table_name}")
)
logger.info(f"===> the target table updating was finished successfull.")

I've read those answers but I'm new in Glue and can't understand how to realize it in a serverless environment. As I understand, all of the native code runs on driver. I suppose that I should make a UDF-function for it, but I have no idea how to do it. The rubby script we ran before used mysqlimport tool for import. But I also have no idea how can I use it.

Thank you for any help!

bulk import to AWS MySQL from s3 or DynamicDataframe

0 Answers0