I want to pass dataframe which has set of values to new query but it fails.
1) Here I am selecting particular column so that I can pass under ISIN in next query
scala> val managerIdDf=finalEmployeesDf.filter($"manager_id"!==0).select($"manager_id").distinct
managerIdDf: org.apache.spark.sql.DataFrame = [manager_id: bigint]
2) My sample data:
scala> managerIdDf.show
+----------+
|manager_id|
+----------+
| 67832|
| 65646|
| 5646|
| 67858|
| 69062|
| 68319|
| 66928|
+----------+
3) When I execute final query it fails:
scala> finalEmployeesDf.filter($"emp_id".isin(managerIdDf)).select("*").show
java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.DataFrame [manager_id: bigint]
I also tried converting to List
and Seq
but it generates an error only. Like below when I try to convert to Seq
and re run the query it throws an error:
scala> val seqDf=managerIdDf.collect.toSeq
seqDf: Seq[org.apache.spark.sql.Row] = WrappedArray([67832], [65646], [5646], [67858], [69062], [68319], [66928])
scala> finalEmployeesDf.filter($"emp_id".isin(seqDf)).select("*").show
java.lang.RuntimeException: Unsupported literal type class scala.collection.mutable.WrappedArray$ofRef WrappedArray([67832], [65646], [5646], [67858], [69062], [68319], [66928])
I also referred this post but in vain. This type of query I am trying it for solving subqueries in spark dataframe. Anyone here pls ?