How can we write user-defined functions in AWS-Glue script using PySpark (Python) on either Dynamic-frame or Data-frame?
Asked
Active
Viewed 3,802 times
2 Answers
2
dynamicframe doesn't support a UDF exactly the way the Dataframe API supports it. The best you will get is the MAP.apply.

Brian
- 848
- 10
- 32
-1
"AWS Glue does not yet directly support Lambda functions, also known as user-defined functions. But you can always convert a DynamicFrame to and from an Apache Spark DataFrame to take advantage of Spark functionality in addition to the special features of DynamicFrames." - AWS Glue Medicaid Python samples
The AWS Glue Medicaid Python samples (quoted/linked above) include a Spark UDF example:
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
chop_f = udf(lambda x: x[1:], StringType())
medicare_dataframe = medicare_dataframe.withColumn(
"ACC", chop_f(
medicare_dataframe["average covered charges"])).withColumn(
"ATP", chop_f(
medicare_dataframe["average total payments"])).withColumn(
"AMP", chop_f(
medicare_dataframe["average medicare payments"]))
medicare_dataframe.select(['ACC', 'ATP', 'AMP']).show()
This is just standard Spark code. If you're looking to use Spark SQL, see this databricks example.