My code doesn't actually use toDF but apparently IndexedRowMatrix does. Code runs just fine in the pyspark shell but when I try it using spark-submit I get the following error...
Traceback (most recent call last):
File "/pathtofile/code.py", line 22, in
<module>
mat = IndexedRowMatrix(mat_indexed_rows)
File "/opt/cloudera/parcels/CDH-5.11.1- 1.cdh5.11.1.p0.4/lib/spark/python/lib/pyspark.zip/pyspark/mllib/linalg/distributed.py", line 232, in __init__
AttributeError: 'PipelinedRDD' object has no attribute 'toDF'
because of this line...
mat = IndexedRowMatrix(mat_indexed_rows)
I saw related questions where it was suggested to use SQLContext but then that breaks other lines of code where I need it to be a SparkContext object.
Is there any way to get around this error?