I created a AWS Glue Job using Glue Studio. It takes data from a Glue Data Catalog, does some transformations, and writes to a different Data Catalog.
When configuring the target node, I enabled the option to create new partitions after running:
The job runs successfully, data is written to S3 with proper partition folder structure, but no new partitions are created in the actual Data Catalog table - I still have to run a Glue Crawler to create them.
The code in the generated script that is responsible for partition creation is this (last two lines of the job):
DataSink0 = glueContext.write_dynamic_frame.from_catalog(frame = Transform4, database = "tick_test", table_name = "test_obj", transformation_ctx = "DataSink0", additional_options = {"updateBehavior":"LOG","partitionKeys":["date","provider"],"enableUpdateCatalog":True})
job.commit()
What am I doing wrong? Why are new partitions not being created? How do I avoid having to run a crawler to have the data available in Athena?
I am using Glue 2.0 - PySpark 2.4