To come up to my problem after two day I found solution. There two post in stack which did not give effective solution.
1 - First Aplied this udf function to convert data.
function
from pyspark.sql import functions as F
from pyspark.sql import types as T
from pyspark.ml.linalg import SparseVector, DenseVector
def sparse_to_array(v):
v = DenseVector(v)
new_array = list([float(x) for x in v])
return new_array
sparse_to_array_udf = F.udf(sparse_to_array, T.ArrayType(T.FloatType()))
2 - Then apply it to the data.
# convert
df = pcaFeatures.withColumn('features_array', sparse_to_array_udf('features'))
Then If you want to convert this matrix to Vector pleas visit this website.
Convert in Vector because after this step you can end up with a sparse matrix not vector then you'll get this error (below) on PCA or other while fitting/transform the data.
IllegalArgumentException: 'requirement failed: Column pcaFeatures_Norm
must be of type
struct,values:array>
but was actually array.'