2

I was working on creating a function which takes in connection string, SQL query and connection properties as a arguments.
The First Scenario Works fine but the Second Scenario Fails with the error mentioned.

First Scenario Works:

    val readSqlData = spark.read.jdbc(connectionString,_:String,connectionProps)  
    val data= readSqlData("(SELECT * FROM TestTable) as TestTable")  

The above two lines give me a data value of type DataFrame.

Second Scenario :

Now I was trying to create a function that can be called from anywhere as a helper function so we don't have to pass connection String and connection Properties for each and every SQL statement that we create as below:

    import org.apache.spark.sql.DataFrame
    def PerformSqlOperations(): String => DataFrame = {
         spark.read.jdbc(connectionString,_:String,connectionProps)
    }

The function compiles properly, but, when I call this function by passing Sql Query to execute as below:

    PerformSqlOperations("(SELECT * FROM TestTable) as TestTable")   

Now I get the error too many arguments for method PerformSqlOperations()..

I am not able to understand why this is happening as the above code that works is similar to this and I was just trying to wrap that inside the function to make things simpler for multiple calls.

Any help or idea would be helpful in letting me know why the function creation and execution gives the error mentioned.

Gal Naor
  • 2,397
  • 14
  • 17
Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35
  • `def PerformSqlOperations()` seems to be missing an input parameter, try `def PerformSqlOperations(query: String)`. – Mario Galic Jul 11 '19 at 13:14
  • @MarioGalic in Partially applied function when you use underscore it will take that when you call the function with the parameter passed if I understood that correctly. – Nikunj Kakadiya Jul 11 '19 at 13:27
  • The problem is actually pretty simple. In your first case `readSqlData` is a **function** from `String => DataFrame`. Whereas, in your second case `PerformSqlOperations` is a **method** _(which is very [different](https://stackoverflow.com/questions/2529184/difference-between-method-and-function-in-scala) from a function)_ which does not take any arguments and returns a **function** from `String => DataFrame`. Your second case has to be called like `PerformSqlOperations()("(SELECT * FROM TestTable) as TestTable") ` first empty parenthesis are for the method and then the function. – Luis Miguel Mejía Suárez Jul 11 '19 at 13:35
  • However, that does not seems useful at all. So, or just leave the **function** from the first example. Or rewrite the second as `def PerformSqlOperations(query: String): DataFrame = spark.read.jdbc(connectionString, query, connectionProps)`. – Luis Miguel Mejía Suárez Jul 11 '19 at 13:36
  • Thanks @MarioGalic and luis for your suggestion as that forced me to try something options and it worked. – Nikunj Kakadiya Jul 11 '19 at 13:46

1 Answers1

2
import org.apache.spark.sql.DataFrame  
def PerformSqlOperations: String => DataFrame = {
    spark.read.jdbc(connectionString,_:String,connectionProps)  
}  

You just need to remove the () after the function name. After removing the parenthesis it is working as expected.

Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35
  • 2
    Why do you want a **method** that always returns the same **function**? Why not using a `val` like in your first case? – Luis Miguel Mejía Suárez Jul 11 '19 at 14:03
  • Because i want that to be called multiple times from different places in different classes. So i thought of creating a function and call that from anywhere and only pass that query as a parameter. – Nikunj Kakadiya Jul 11 '19 at 14:19
  • 1
    You an do exactly the same with the `val`. Or instead of having a **method** that returns a **function**, just a **method** that takes the _query_ & returns the `DataFrame`. As I showed in my other comment. – Luis Miguel Mejía Suárez Jul 11 '19 at 14:23