0

I am pretty new to Apache Spark SQL and trying to achieve the following. I have the following DF which I want to convert to a intermediate DF and then to json.

array [a,b,c,d,e] and  array [1,2,3,4,5]

Need them to be

a 1
b 2
c 3

Tried the explode option but I get only one array exploded.

Thanks for the assistance..

sarashan
  • 1
  • 1

1 Answers1

2

To join two dataframes in Spark you will need to use a common column which exists on both dataframes and since you don't have one you need to create it. Since version 1.6.0 Spark supports this functionality through monotonically_increasing_id() function. The next code illustrates this case:

    import org.apache.spark.sql.functions._
    import spark.implicits._

    val df = Seq("a","b","c","d","e")
      .toDF("val1")
      .withColumn("id", monotonically_increasing_id)

    val df2 = Seq(1, 2, 3, 4, 5)
      .toDF("val2")
      .withColumn("id", monotonically_increasing_id)

    df.join(df2, "id").select($"val1", $"val2").show(false)

Output:

+----+----+
|val1|val2|
+----+----+
|a   |1   |
|b   |2   |
|c   |3   |
|d   |4   |
|e   |5   |
+----+----+

Good luck

abiratsis
  • 7,051
  • 3
  • 28
  • 46