1

I want to Join lit() column with non literal column.

rdd1 = spark.createDataFrame([('1', 'a'), ('2', 'b'), ('3', 'c')], ['id1', 'val'])
rdd1 = rdd1.withColumn('id2',lit('1'))

rdd2 = spark.createDataFrame([('1', 2, 1), ('2', 3, 0), ('3', 3, 1)], ['key1', 'key2', 'val'])

res = rdd1.join(rdd2, [rdd1['id2'] == rdd2['key1']],'left')

When I explain res dataframe, I got cartesian product issue even though it is not meet as where 1=1 condition.

>>> res.explain()
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Detected cartesian product for LEFT OUTER join between logical plans
Project [id1#1456, val#1457, 1 AS id2#1460]
+- LogicalRDD [id1#1456, val#1457], false
and
Filter (isnotnull(key1#1464) && (1 = key1#1464))
+- LogicalRDD [key1#1464, key2#1465L, val#1466L], false
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
sivaguru
  • 25
  • 2
  • 7
  • In that example they said where 1=1 as cartesian but in my case it should run like filter condition correct?? – sivaguru Dec 11 '18 at 11:48

0 Answers0