I am new in pyspark.
I want to count the correlation between a column(int)
with another column(vector from onehotencoder)
.
I use this code:
import six
for i in df.columns:
if not(isinstance(df.select(i).take(1)[0][0], six.string_types)):
print( "Correlation to label for", i, df.stat.corr('label',i))
I got this error when it counts the correlation between label a onehotencoder column:
Py4JJavaError: An error occurred while calling o9219.corr. :
java.lang.IllegalArgumentException:
requirement failed:
Currently correlation calculation for columns with dataType org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 not supported