What is the difference? I know that DynamicFrame was created for AWS Glue, but AWS Glue also supports DataFrame. When should DynamicFrame be used in AWS Glue?
-
2You may also want to use a dynamic frame just for the ability to load from the supported sources such as S3 and use job bookmarking to capture only new data each time a job runs. – Kyle Oct 24 '18 at 21:50
-
Dynamic Frame takes more time for processing than Dataframe – sweety Mar 13 '23 at 06:17
4 Answers
DynamicFrame is safer when handling memory intensive jobs. "The executor memory with AWS Glue dynamic frames never exceeds the safe threshold," while on the other hand, Spark DataFrame could hit "Out of memory" issue on executors. (https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-debug-oom-abnormalities.html)
DynamicFrames are designed to provide maximum flexibility when dealing with messy data that may lack a declared schema. Records are represented in a flexible self-describing way that preserves information about schema inconsistencies in the data.
For example, with changing requirements, an address column stored as a string in some records might be stored as a struct in later rows. Rather than failing or falling back to a string, DynamicFrames will track both types and gives users a number of options in how to resolve these inconsistencies, providing fine grain resolution options via the ResolveChoice transforms.
DynamicFrames also provide a number of powerful high-level ETL operations that are not found in DataFrames. For example, the Relationalize transform can be used to flatten and pivot complex nested data into tables suitable for transfer to a relational database. In additon, the ApplyMapping transform supports complex renames and casting in a declarative fashion.
DynamicFrames are also integrated with the AWS Glue Data Catalog, so creating frames from tables is a simple operation. Writing to databases can be done through connections without specifying the password. Moreover, DynamicFrames are integrated with job bookmarks, so running these scripts in the job system can allow the script to implictly keep track of what was read and written.(https://github.com/aws-samples/aws-glue-samples/blob/master/FAQ_and_How_to.md)

- 1,597
- 18
- 18
-
7Great summary, I'd also add that DyF are a high level abstraction over Spark DF and are a great place to start. Most of the generated code will use the DyF. When something advanced is required then you can convert to Spark DF easily and continue and back to DyF if required. The biggest downside is that it is a proprietary API and you can't pick up your code and run it easily on another vendor Spark cluster like Databricks, Cloudera, Azure etc. – Davos Aug 06 '19 at 12:56
-
You can refer to the documentation here: DynamicFrame Class. It says,
A DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially.
You want to use DynamicFrame
when,
- Data that does not conform to a fixed schema.
Note: You can also convert the DynamicFrame
to DataFrame
using toDF()
- Refer here: def toDF

- 1,768
- 17
- 23
-
1A dataframe will have a set schema (schema on read). Your data can be nested, but it must be schema on read. In the case where you can't do schema on read a dataframe will not work. I have yet to see a case where someone is using dynamic frames for something that varies so much that it can't be schema on read... I guess the only option then for non glue users is to then use RDD's. I would love to see a benchmark of dynamic frames vrs dataframes.. ;-) all those cool additions made to dataframes that reduce shuffle ect.. – Brian Jan 03 '22 at 18:18
A DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially

- 31
- 2
DynamicFrame are intended for schema managing. So, as soon as you have fixed schema go ahead to Spark DataFrame method toDF()
and use pyspark as usual.
A DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly ...
You can convert DynamicFrames to and from DataFrames after you resolve any schema inconsistencies.

- 188
- 2
- 10
-
2I noticed that applying the toDF() method to a dynamic frame takes several minutes when the amount of data is large. What can we do to make it faster besides adding more workers to the job? – ruifgmonteiro Nov 10 '20 at 12:23
-
In my case, I bypassed this by discarding DynamicFrames, because data type integrity was guarateed, so just used spark.read interface. – SantiagoRodriguez Dec 16 '20 at 08:57
-
it would be better to avoid back and forth conversions as much as possible. – Brian Jan 03 '22 at 18:13
-
I want to use Bookmarks of Dynamic Frame, But it takes 3 times more time than Dataframe, Is there any other way? – sweety Mar 13 '23 at 06:20