2

What are the advantages and disadvantages of using python or java when developing apache flink stateful function.

  • Is there any performance difference? which one is more efficient for the same operation?
  • Can we develop the application completely on python?
  • What are the features that one supports and the other does not.

1 Answers1

5

StateFun support embedded functions and remote functions.

  • Embedded functions are bundled and deployed within the JVM processes that run Flink. Therefore they must be implemented in a JVM language (like Java) and they would be the most performant. The downside is that any change to the function code requires a restart of the Flink cluster.

  • Remote functions are functions that are executing in a separate process, and are invoked by the Flink cluster for every incoming message addressed to them. Therefore they are expected to be less performant than the embedded functions, but they provide a great flexibility in:

    • Choosing an implementation language
    • Fast scaling up and down
    • Fast restart in case of a failure.
    • Rolling upgrades

Can we develop the application completely on python?

Is it is possible to develop an application completely in Python, see the python greeter example.

What are the features that one supports and the other does not.

The current features are currently supported only in the Java SDK:

  • Richer routing logic from an ingress to a function. Any routing logic that you can describe via code.
  • Few more state types like a table and a buffer.
  • Exposing existing Flink sources and Sinks as ingresses and egresses.
Igal
  • 491
  • 2
  • 8
  • 1
    Not sure if that is what you are looking for but have you looked into faust? A native streaming python framework. – Sebastian Zaba May 14 '20 at 03:50
  • This is an extremely valuable conclusion. Do you know how much more performant would be Native Java functions? – Kanso Code Nov 10 '20 at 06:08
  • 1
    @Mikki It is hard to say conclusively without knowing more information. But invoicing a functions that executes within the Flink process is essentially a method call in Java, while invoking a remote function involves an RPC. But remote functions can be scaled easily while the embedded functions not as easy. (require a stateful upgrade) – Igal Nov 10 '20 at 14:59