How can I Join two RDD with item_id columns
##
RDD1 = spark.createDataFrame([('45QNN', 867),
('45QNN', 867),
('45QNN', 900 )]
, ['id', 'item_id'])
RDD1=RDD1.rdd
RDD2 = spark.createDataFrame([('867',229000,'house',90),
('900',350000,'apartment',120)]
, ['item_id', 'amount','parent','size'])
RDD2=RDD2.rdd
As suggested How do you perform basic joins of two RDD tables in Spark using Python? I did try but it gets empty data set
innerJoinedRdd = RDD1.join(RDD2)
or
RDD2.join(RDD1, RDD1("item_id")==RDD2("item_id")).take(5)
I need all the columns except parent. Please help?