31

There are a lot of methods in API which received this with default "" value.

Is it just string marker but again what it purpose?

Cherry
  • 31,309
  • 66
  • 224
  • 364
  • 2
    About donwvoting or close - I have searched in documentation (17.01.2018) and did not find any description about this field :( may somebody knows? – Cherry Jan 17 '18 at 12:03

3 Answers3

16

Many of the AWS Glue PySpark dynamic frame methods include an optional parameter named transformation_ctx, which is used to identify state information for a job bookmark. If you do not pass in the transformation_ctx parameter, then job bookmarks are not enabled for a dynamic frame or table used in the method.

https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

이재승
  • 161
  • 1
  • 3
  • 4
    But what are you supposed to pass in? – codecitrus Jun 01 '18 at 18:41
  • I have the same question about it. What I should pass in the transformation_ctx parameter? – ljofre Apr 12 '19 at 20:09
  • 1
    It should be a `string` which is used as an ID for the bookmark. – clds Aug 15 '19 at 13:40
  • The linked docs say it should be "a unique identifier for the ETL operator instance". This leads me to believe a string UUID would be sufficient, but the definition of "ETL operator instance" is a bit unclear to me. – Matt Hancock Dec 28 '21 at 19:10
10

I think this is what is going on. I wish the AWS docs would explicitly state it.

Bookmarks alone would only let you pick up at the next piece of data (e.g. next file in S3). But for a complex job with Dynamic Frames, the job itself it stateful. To resume processing, you need to not only pick up with the next piece of input, but also restore the state you had built up within your Dynamic Frames during the last run. The transformation_ctx is like a filename for saving the Dynamic Frame state. You have to name it, because AWS Glue isn't going to analyze your script to figure out which dynamic frame invocation is which.

Inferred primarily from Tracking Processed Data Using Job Bookmarks, which is the same page that other answers linked, but has somewhat clarified text since they quoted it:

Many of the AWS Glue PySpark dynamic frame methods include an optional parameter named transformation_ctx, which is a unique identifier for the ETL operator instance. The transformation_ctx parameter is used to identify state information within a job bookmark for the given operator. Specifically, AWS Glue uses transformation_ctx to index the key to the bookmark state.

Lorrin
  • 1,799
  • 1
  • 16
  • 21
  • Does it mean that if I do not pass transformation_ctx, bookmark will not be enabled just for this DynamicFrame? I tried it, but AWS Glue still doesn't process all the data from this transformation, but only new one. – mdziob Apr 14 '22 at 15:22
4

As mentioned in this link, transformation_ctx parameter is used for job bookmarks. If you don't want to enable job bookmark then don't pass the parameter.

Moreover, if you want to use job bookmarks, then enable the job bookmark parameter and pass value using transformation_ctx parameter.

Subinoy
  • 478
  • 7
  • 22