I am writing spark applications in scala using IntelliJ IDEA and maven as build tool.
I deploy them in Azure HDInsight cluster. I have Azure Plugin for Intellij installed for that.
I use Event Hubs to stream data and perform some transformation before writing them to storage.
I am pretty new to all spark, scala, Intellij and Event Hubs.
I debug the programs in 2 different ways:
build jar (using mvn clean and mvn package) and use spark-submit to submit application to spark cluster click on small play button to the left of object having main function to execute the code
I have fair idea of what maven does - I think it gets the dependencies mentioned in pom.xml to some local location user's .m2 folder. These jars will be referenced while we do mvn package to check all referenced libraries for syntax then builds jar of the application.
I would like to understand how dependency is resolved in IntelliJ IDEA while running using second method.
- I am able to do mvn clean and mvn package. It cleaned, ran the test cases and built the jar. However in IDE, It showed red (not found) for method call for some methods. I could ctrl+click and go to EventData class decompiled from bytecode and verify that. However I checked in the jar listed in project pane External Libraries. The method existed in the jar. The jar which did not have the method was probably in some folder like .ivy
- I am able to do mvn clean and mvn package. IDE does not show any red marks for unavailable Value but when I try to run it using green play button, it shows error that value was not found. I can even ctrl+click and navigate to the class and see that it exists
Both errors are related to Event Hubs and one suggestion I found was that the jar referenced might be different from the required version and that I match the correct version of Event Hub to my spark version. I tried that as well with same results as above - passes in maven and fails in IDE.
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-eventhubs-spark_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
I think maven uses my .m2 folder and jars inside it to build the project and IntelliJ uses something else (maybe ivy) to resolve dependencies in its development environment. Can anyone help me understand and solve this?
- Is there a way to know and tell IntelliJ which specific version of jar to use apart from mentioning in pom.xml?
- Is there a way to tell IntelliJ to use maven collected jars so that mvn package and IDE environment resolve dependencies using same jar?