0

Now we try to change data exporter on Rubby to AWS GLUE. I have IBM database as a source and AWS MySQL as a target My need is to import tables from three databases to the target. I've made:

  • the bronze layer with replicas of three tables with the same structure,
  • the silver layer with all data manipulations (union, column renaming and drop duplicates). The result table is partitioned by a categorial column and takes 2 GiB as parquet-file with snappy or 6 GiB as CSV-file.

All imports to the bronze layer was bulked by 'hashfield' in connection_options parametres.

Now I need to import it to the existing AWS MySQL table. Ideally - import just new strings. But all-table-migration is Ok also.

The native code like the one below takes too much time (it was killed by timeout after 120 minutes):

                                                    
# Update target table
logger.info(f"===> Start the target table updating.")
MySQL_node = (
    glueContext
    .write_dynamic_frame
    .from_catalog( frame=dfc,
                              database="sandbox_db",
                              table_name=f"sandbox_db_{target_table_name}")
)
logger.info(f"===> the target table updating was finished successfull.")

I've read those answers but I'm new in Glue and can't understand how to realize it in a serverless environment. As I understand, all of the native code runs on driver. I suppose that I should make a UDF-function for it, but I have no idea how to do it. The rubby script we ran before used mysqlimport tool for import. But I also have no idea how can I use it.

Thank you for any help!

Vadim.M.
  • 75
  • 7

0 Answers0