0

Dataframe

Above is my dataframe, after performing PCA to my features: I want to plot the PCA features (i.e. values ) with hue as Article, in Pyspark

I tried the following code,

def abc(x):
    lst = x['values']
    return lst[0], lst[1], lst[2]

pca_df['col1'], pca_df['col2'], pca_df['col3'] = pca_df.map(lambda x: abc(x.select('PCA_features')))

I am getting the below error:

AttributeError: 'DataFrame' object has no attribute 'map'

Can someone help me in extracting the features in columns like feature1, feature2, feature3, in the dataframe along with the articles, so that I can plot it. Or suggest if there is some other way to do the same.

  • Could you show the schema of your dataframe? – Jonathan Lam Sep 01 '22 at 06:34
  • @JonathanLam, ``article: string(nullable = True), PCA_features: vector(nullable= True)`` – snigdha mohapatra Sep 01 '22 at 07:22
  • @JonathanLam, yes the above link helped me to access element of Vector , but still I am not able to plot it. ``pca_df = df.withColumn('PC_1',ith("pca",lit(0))).withColumn('PC_2',ith("pca",lit(1)))`` ``sns.scatterplot(data = pca_df, x='PC_1', y='PC_2', hue = 'article') plt.title('PCA Features') plt.show()`` It is giving error {ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.} – snigdha mohapatra Sep 01 '22 at 11:47
  • what type is `pca_df`? If that is a `spark-df ` or `pandas-df` there is no map function available. – s510 Sep 01 '22 at 13:51

0 Answers0