Do you guys have experience with spark SQL like join? Spark 1.5.0
sqlContext.sql("SELECT COUNT(*) FROM data a JOIN tokens b WHERE a.text LIKE CONCAT('%', token, '%')")
vs some ugliness like
sqlContext.sql("SELECT * FROM data a WHERE a.text LIKE '%token1%' UNION ALL SELECT * FROM data a WHERE a.text LIKE '%token2%' UNION ALL ....")
or something similar without joining 2 tables with like join. data table would have tens of milions rows, text column about 100 characters and tokens table thousands of tokens (some of them with % inside). The second thing works much faster. The like join takes ages and the execution is suspicious as the time to finish tasks rise exponentially (I'd expect each partition takes same time to finish).
Thanks