0

I am a Java Developer writing (Stand-Alone) Applications for Apache Spark. To create Artifacts I use Gradle along with the ShadowJar Plugin.

A few team mates want to use Python. Currently, they use JetBrains PyCharm to write these Python Scripts and remotely execute them on the Spark Cluster Environment. However, this process does not scale well (What to do if there is more than one file involved?) and I am looking for a solution in the Python Ecosystem. A problem is that neither I nor one of my Team members is a Python Expert (in fact the other team mates are no developers, but have to write code. Management decisions...), so we do not have any clue what is the best practice for Python development.

I tried PyGradle, but it did not feel smoothly integratable, especially with Apache Spark. I tripped over names like Pip, Pex, Setuptools, VirtualEnv. What are those tools? How do they interfere with each other?

To prevent the X-Y Problem: I want a Codebase which can be build, (unit-)tested and packaged with one command (like gradle build). The resulting artifact should be able to be deployed and executed on a Spark Cluster.

Skym0sh0
  • 305
  • 3
  • 6

1 Answers1

1

i am also new to this world and want to setup process in the AI startup. i think http://pybuilder.github.io/ is at least good startpoint for automation as i am trying to setup this among us.