toDF error when using IndexedRowMatrix and spark-submit but works in pyspark

Asked Oct 01 '17 at 20:41

Active Oct 01 '17 at 22:19

Viewed 45 times

My code doesn't actually use toDF but apparently IndexedRowMatrix does. Code runs just fine in the pyspark shell but when I try it using spark-submit I get the following error...

Traceback (most recent call last):
File "/pathtofile/code.py", line 22, in 
<module>
mat = IndexedRowMatrix(mat_indexed_rows)
File "/opt/cloudera/parcels/CDH-5.11.1- 1.cdh5.11.1.p0.4/lib/spark/python/lib/pyspark.zip/pyspark/mllib/linalg/distributed.py", line 232, in __init__
AttributeError: 'PipelinedRDD' object has no attribute 'toDF'

because of this line...

mat = IndexedRowMatrix(mat_indexed_rows)

I saw related questions where it was suggested to use SQLContext but then that breaks other lines of code where I need it to be a SparkContext object.

Is there any way to get around this error?

edited Oct 01 '17 at 22:19

asked Oct 01 '17 at 20:41

cpd1

toDF error when using IndexedRowMatrix and spark-submit but works in pyspark

0 Answers0