6

I have seen the use of ctx in a code repo, what exactly is this? Is it a built in library? When would I use it?

I've seen it in examples such as the following:

df = ctx.spark.createdataframe(...
mpSchrader
  • 902
  • 3
  • 20
Robert F
  • 187
  • 5

1 Answers1

3

For Code Repositories transformations, you can optionally include a parameter ctx which gives you more access to the underlying infrastructure running your job. Typically, you'll access the ctx.spark_session attribute for making your own pyspark.sql.Dataframe objects from Python objects, like:

from transforms.api import transform_df, Output
from pyspark.sql import types as T

@transform_df(
  Output=("/my/output")
)
def my_compute_function(ctx):

   schema = T.StructType(
     [
       T.StructField("name", T.StringType(), True)
     ]
   )
   return ctx.spark_session.createDataFrame([["Alex"]], schema=schema)

You'll find a full API description in documentation on the transforms.api.TransformContext class, where attributes such as the spark_session and parameters are available for you to read.

Note: the spark_session attribute has type pyspark.sql.SparkSession

Niklas R
  • 16,299
  • 28
  • 108
  • 203
vanhooser
  • 1,497
  • 3
  • 19