Now we try to change data exporter on Rubby to AWS GLUE. I have IBM database as a source and AWS MySQL as a target My need is to import tables from three databases to the target. I've made:
- the bronze layer with replicas of three tables with the same structure,
- the silver layer with all data manipulations (union, column renaming and drop duplicates). The result table is partitioned by a categorial column and takes 2 GiB as parquet-file with snappy or 6 GiB as CSV-file.
All imports to the bronze layer was bulked by 'hashfield' in connection_options parametres.
Now I need to import it to the existing AWS MySQL table. Ideally - import just new strings. But all-table-migration is Ok also.
The native code like the one below takes too much time (it was killed by timeout after 120 minutes):
# Update target table
logger.info(f"===> Start the target table updating.")
MySQL_node = (
glueContext
.write_dynamic_frame
.from_catalog( frame=dfc,
database="sandbox_db",
table_name=f"sandbox_db_{target_table_name}")
)
logger.info(f"===> the target table updating was finished successfull.")
I've read those answers but I'm new in Glue and can't understand how to realize it in a serverless environment. As I understand, all of the native code runs on driver. I suppose that I should make a UDF-function for it, but I have no idea how to do it. The rubby script we ran before used mysqlimport tool for import. But I also have no idea how can I use it.
Thank you for any help!