3

How can we write user-defined functions in AWS-Glue script using PySpark (Python) on either Dynamic-frame or Data-frame?

Grant Miller
  • 27,532
  • 16
  • 147
  • 165
Vinay Agarwal
  • 197
  • 1
  • 15

2 Answers2

2

dynamicframe doesn't support a UDF exactly the way the Dataframe API supports it. The best you will get is the MAP.apply.

Brian
  • 848
  • 10
  • 32
-1

"AWS Glue does not yet directly support Lambda functions, also known as user-defined functions. But you can always convert a DynamicFrame to and from an Apache Spark DataFrame to take advantage of Spark functionality in addition to the special features of DynamicFrames." - AWS Glue Medicaid Python samples

The AWS Glue Medicaid Python samples (quoted/linked above) include a Spark UDF example:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

chop_f = udf(lambda x: x[1:], StringType())
medicare_dataframe = medicare_dataframe.withColumn(
        "ACC", chop_f(
            medicare_dataframe["average covered charges"])).withColumn(
                "ATP", chop_f(
                    medicare_dataframe["average total payments"])).withColumn(
                        "AMP", chop_f(
                            medicare_dataframe["average medicare payments"]))
medicare_dataframe.select(['ACC', 'ATP', 'AMP']).show()

This is just standard Spark code. If you're looking to use Spark SQL, see this databricks example.

bsplosion
  • 2,641
  • 27
  • 38
Kyle
  • 1,366
  • 2
  • 16
  • 28