6

I am using pyspark 2.0 to create a DataFrame object by reading a csv using:

data = spark.read.csv('data.csv', header=True)

I find the type of the data using

type(data)

The result is

pyspark.sql.dataframe.DataFrame

I am trying to convert the some columns in data to LabeledPoint in order to apply a classification.

from pyspark.sql.types import *    
from pyspark.sql.functions import loc
from pyspark.mllib.regression import LabeledPoint

data.select(['label','features']).
              map(lambda row:LabeledPoint(row.label, row.features))

I came across with this problem:

AttributeError: 'DataFrame' object has no attribute 'map'

Any idea on the error? Is there a way to generate a LabelPoint from DataFrame in order to perform classification?

Xi Liang
  • 1,649
  • 3
  • 10
  • 5
  • 1
    Does this answer your question? [AttributeError: 'DataFrame' object has no attribute 'map'](https://stackoverflow.com/questions/39535447/attributeerror-dataframe-object-has-no-attribute-map) – Yosi Dahari Feb 20 '21 at 14:01

1 Answers1

20

Use .rdd.map:

>>> data.select(...).rdd.map(...)

DataFrame.map has been removed in Spark 2.