0

This is reference to solution given in scala for thread [How to avoid duplicate columns after join?

>> a.show
+---+----+
|key|val|
+---+----+
|  a|   1|
|  b|   2|
+---+----+

and

>>> b.show
+---+----+
|key|val|
+---+----+
|  a|   11|
+---+----+

Expected output

>>> 
+---+----+
|key|val|
+---+----+
|  a|   1|
+---+----+

So I have to fetch data from dataframe "a" when "key" matches on both "a" and "b"

One of the Solution given in scala is is working which is given below

scala> a.join(b, a("key") === b("key"), "left").select(a.columns.map(a(_)) : _*).show

Due to my no knowlege in scala , I am not able to implement this is python. Kindly help me fix this python. Any other solution would be appreciated (without hardcoding columns of dataframe)

Bharat Sharma
  • 1,081
  • 3
  • 11
  • 23

1 Answers1

1
val a = sc.parallelize(Seq(("a","1"),("b","2"))).toDF("key","value")
a.show

val b = sc.parallelize(Seq(("a","11"))).toDF("key","value")
b.show

a.join(b, a("key") === b("key"), "leftsemi").show

enter image description here

Chandan Ray
  • 2,031
  • 1
  • 10
  • 15