0

I am working on a small project where I am taking data from kafka and send each record to UDF. In UDF we have while loop code which I need to replace with tail recursion.

while (condition) {
    fields
    body
}

to

def whileReplacement(dummy: Int): Int = {
    if(!condition) return 1
    body
    return parseExtTag(dummy)
}

But I am getting java.io.NotSerializableException. I do not understand what causing the error and how to solve it. If you have any better approach to solve this please provide it. Thanks you

Oli
  • 9,766
  • 5
  • 25
  • 46
Nagababu
  • 17
  • 5
  • [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) – Dmytro Mitin Sep 27 '22 at 09:27
  • https://stackoverflow.com/search?q=%5Bapache-spark%5D+NotSerializableException – Dmytro Mitin Sep 27 '22 at 09:32
  • https://medium.com/onzo-tech/serialization-challenges-with-spark-and-scala-a2287cd51c54 https://medium.com/onzo-tech/serialization-challenges-with-spark-and-scala-part-2-now-for-something-really-challenging-bd0f391bd142 https://stackoverflow.com/questions/40818001/understanding-spark-serialization You can switch on `scalacOptions += "-Xprint:typer"` in `build.sbt` and compare how Scala compiles the former and latter, and see what can make the difference. – Dmytro Mitin Sep 27 '22 at 10:58
  • Try to replace a method with a function `val whileReplacement: Int => Int = dummy => { ... }`. Sometimes this restores serialization https://stackoverflow.com/questions/22592811/task-not-serializable-java-io-notserializableexception-when-calling-function-ou Also you can check whether there is something useful in `Serialization stack: ...` from `SerializationDebugger` (`-Dsun.io.serialization.extendedDebugInfo=true`, is it default?) https://stackoverflow.com/questions/39150003/user-defined-variables-in-spark-org-apache-spark-sparkexception-task-not-seri – Dmytro Mitin Sep 27 '22 at 13:13

1 Answers1

1

Before I Just declare and call the recursive function in the UDF function it self.

The problem was solved by placing the recursive function outside of the UDF function. I think by providing the function to the spark executors solves the serialization problem in this case. This is Just my understanding, I am not sure about what was really happening. If any one know please explain it.

Nagababu
  • 17
  • 5