0

I created the following model:

class EquipmentEmbeddingEndpoint(mlflow.pyfunc.PythonModel):
    def load_context(self, context):    
        self.identifiers_df = get_identifier_information()

    def predict(self, context, model_input):
        print('got in predict function')
        return {"output": "dummy"}

which calls the following get_identifier_information() which exists in the same Python notebook:

def get_identifier_information():
    identifiers_df = spark.sql(f"""
                SELECT * FROM third_party_products tpp
            """)
    return identifiers_df

This is how I log the model:

import numpy as np
with mlflow.start_run():
sample_inputs = np.array(["btr197", "ao smith"])

mlflow.pyfunc.log_model("test_equipment_embedding",
    python_model=EquipmentEmbeddingEndpoint(),
    registered_model_name='test_equipment_embedding',
    input_example=sample_inputs,
)

And this is the error I am running into:

RuntimeError: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

Could I please get some help figuring this out? Thanks in advance!

nikhil
  • 1
  • 1
  • Welcome! Can you [please read](https://meta.stackoverflow.com/a/285557/15405732) about the problems with images of text and then edit your question to add transcriptions of your images of text as actual text? Perhaps useful: [editing help](https://stackoverflow.com/editing-help). – Koedlt Jun 07 '23 at 18:49
  • 1
    Sounds good. Thank you! – nikhil Jun 07 '23 at 20:46
  • See SPARK-5063 as it suggests. See [this](https://stackoverflow.com/questions/29815878/how-to-deal-with-error-spark-5063-in-spark) and [this](https://stackoverflow.com/questions/31508689/spark-broadcast-variables-it-appears-that-you-are-attempting-to-reference-spar) also. TL;DR is that your code running on executor is trying to use `sc`, only the code running on driver is allowed to do so. It's like running `spark.read(...)` inside a UDF, it's not supported. – Kashyap Jun 07 '23 at 20:47

0 Answers0