Is it possible to update and insert data in AWS Glue database using glue

Question

So I am using AWS pyspark, and have gigabytes of data everyday, which is getting updated. I want to find the id of the data in an existing table in glue database, update if the id already exists and insert if the id does not exist.

Is it possible to do it in AWS glue?

Thanks!

score 1 · Accepted Answer · answered May 10 '21 at 06:53

Yes, you can use the Glue Pyspark Extension for this.

data_sink = glue_context.getSink(
                    path="s3_path",
                    connection_type="s3",
                    updateBehavior="UPDATE_IN_DATABASE",
                    partitionKeys=['partition_column'],
                    compression="snappy",
                    enableUpdateCatalog=True,
                )
data_sink.setCatalogInfo(
                catalogDatabase=database_name,
                catalogTableName=table_name,
                )
data_sink.setFormat("glueparquet")
data_sink.writeFrame(data_frame)

score 0 · Answer 2 · answered May 08 '21 at 09:56

0

You can use Athena queries in the glue job to implement your logic. https://docs.aws.amazon.com/athena/latest/ug/querying-athena-tables.html

answered May 08 '21 at 09:56

Vikram Rawat

1,472
11
16

Is it possible to update and insert data in AWS Glue database using glue

2 Answers2