What is the difference between transform & transform_df in Palantir Foundry?

Question

Can someone explain why we need transform & transform_df methods separately?

score 8 · Answer 1 · answered Aug 12 '21 at 15:10

There's a small difference between the @transform and @transform_df decorators in Code Repositories:

@transform_df operates exclusively on DataFrame objects.
@transform operates on transforms.api.TransformInput and transforms.api.TransformOutput objects rather than DataFrames.

If your data transformation depends exclusively on DataFrame objects, you can use the @transform_df() decorator. This decorator injects DataFrame objects and expects the compute function to return a DataFrame.

Alternatively, you can use the more general @transform() decorator and explicitly call the dataframe() method to access a DataFrame containing your input dataset.

score 2 · Answer 2 · answered Apr 07 '23 at 07:04

One addition to the answer of @Adil B. @transform_df can handle only one output, whereas in @transform can have multiple, but you are in chagre of writing the output:

from pyspark.sql import DataFrame
from transforms.api import transform_df, Input, Output

@transform_df(
    Output("some_foundry_id"),
    input_dataset=Input("another_foundy_id"),
)
def compute(input_dataset: DataFrame) -> DataFrame:
    return input_dataset

the dataframe you return here will be saved by palantir in the output

from pyspark.sql import DataFrame
from transforms.api import transform, Input, Output

@transform(
    input_1=Input("..."),
    output_1=Output("..."),
    output_2=Output("..."),
)
def compute(input_1: Input, output_1: Output, output_2: Output) -> None:
    output_1.write_dataframe(input_1.dataframe())
    output_2.write_dataframe(input_1.dataframe())

What is the difference between transform & transform_df in Palantir Foundry?

2 Answers2