In case you are running Spark 2.4 instead of 3.0.1 indicated by the imported package, you need to write yourself a wrapper because Spark 2.4 has spark-avro only for Java/Scala. Follow the instructions in this answer:
from pyspark.sql.column import Column, _to_java_column
def from_avro(col, jsonFormatSchema):
sc = SparkContext._active_spark_context
avro = sc._jvm.org.apache.spark.sql.avro
f = getattr(getattr(avro, "package$"), "MODULE$").from_avro
return Column(f(_to_java_column(col), jsonFormatSchema))
def to_avro(col):
sc = SparkContext._active_spark_context
avro = sc._jvm.org.apache.spark.sql.avro
f = getattr(getattr(avro, "package$"), "MODULE$").to_avro
return Column(f(_to_java_column(col)))
Make sure that spark-avro dependency has the right version specified when providing to the --packages
.
If the assumption about you running Spark version < 3 is incorrect, please provide more details.