4

I have the following two dataframes .

dataframe1
┌────────────┬─────────────┬──────────────┐
│idZones     │Longitude    │latitude      |
├────────────┼─────────────┼──────────────┤
|[50,30,10]  |-7.073781666 |33.826661     |
└────────────┴─────────────┴──────────────┘
dataframe2
┌────────────┬─────────────┬──────────────┐
│id          │col1         │col2          │
├────────────┼─────────────┼──────────────┤
│10          │aaaaaaaaaaaa │bb32          │
│90          │ppp          │cc20          │
└────────────┴─────────────┴──────────────┘

I want the following output

┌────────────┬─────────────┬──────────────┐
│id          │col1         |col2          │
├────────────┼─────────────┼──────────────┤
│10          │aaaaaaaaaaaa │bb32          │
└────────────┴─────────────┴──────────────┘

I use the following code

 dataframe1.join(dataframe2,dataframe2.col("id").isin(dataframe1.col("idZones")));

Note the idZones column is an array[int]

I get this error

cannot resolve '(`id` IN (dataframe1.`idZones`))' due to data type mismatch: Arguments must be same type but were: int != array<int>;;

I need your help

Thank you

Avi Cohen
  • 117
  • 5
HBoulmi
  • 333
  • 5
  • 16

2 Answers2

1

Try replacing isin with array_contains.

dataframe1.join(dataframe2, expr("array_contains(idZones, id)"))

(fixed code, thanks @Shantanu Kher)

Another way, you can just explode dataframe1.col("idZones"), and then do a join.

Rayan Ral
  • 1,862
  • 2
  • 17
  • 17
  • thank you for your reply .But Cannot resolve method 'array_contains' in 'Column' , and I want to take the lines of dataframe2 witch id is in idZones. – HBoulmi Jul 08 '20 at 20:19
  • 1
    correct syntact for array_contains - dataframe1.join(dataframe2, expr("array_contains(idZones, id)")).show(false) – Shantanu Kher Jul 08 '20 at 21:04
0

As @Rayan suggested, you can use explode before you join dataframe1 with dataframe2

Creating dataframes -

val dataframe1 = Seq((Array(50,30,10),-7,30)).toDF("idZones","longitude","latitude")

val dataframe2 = Seq((10,"aaaaa","bb32"),(90,"ppp","cc20")).toDF("id","col1","col2")

Explosion logic -

dataframe1.select(explode($"idZones").as("idZones")).join(dataframe2, $"idZones" === dataframe2("id")).show(false)

Output -

+-------+---+-----+----+
|idZones|id |col1 |col2|
+-------+---+-----+----+
|10     |10 |aaaaa|bb32|
+-------+---+-----+----+
Shantanu Kher
  • 1,014
  • 1
  • 8
  • 14