3

I am trying to run Logistic regression with a simple data set to understand the syntax of pyspark. I have data which looks has 11 columns where the first 10 columns are features and the last column(11th column) is the label. I want to pass these 10 columns as features and the 11th column as label. But I only know to pass as a single column to pass as a feature using featuresCol="col_header_name" I have read the data from a csv file using pandas but I have converted it into RDD. here is the code:

from pyspark.ml.classification import LogisticRegression
from pyspark.sql import SQLContext
from pyspark import SparkContext
import pandas as pd
data = pd.read_csv('abc.csv')
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)
spDF = sql.createDataFrame(data)
tri=LogisticRegression(maxIter=10,regParam=0.01,featuresCol="single_column",labelCol="label")
lr_model = tri.fit(spDF)

if I use featuresCol=[list_of_header_names] I get errors. I have used sk-learn which has really simple syntax something like:

reg=LogisticRegression()
reg=reg.fit(Dataframe_of_features,Label_array)
A-ar
  • 86
  • 1
  • 10

1 Answers1

5

You need to combine all the columns into one array of feature using Vector Assembler.

from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
assembler = VectorAssembler(inputCols=[list_of_header_names],outputCol="features")
spDF = assembler.transform(spDF)

You can then pass that assembled array of all the variables as an input to the logistic regression.

tri=LogisticRegression(maxIter=10,
                       regParam=0.01,
                       featuresCol="features",
                       labelCol="label")
lr_model = tri.fit(spDF)
Daniel Schneider
  • 1,797
  • 7
  • 20
pratiklodha
  • 1,095
  • 12
  • 20
  • It works, Thanks! Just one more thing.what are MaxIter, RegParam and ElasticNetParam ? – A-ar Feb 19 '19 at 08:05
  • MaxIter is the number of maximum iteration, RegParam is the regularization parameter. Elastic Net Param specifies if you want the loss function to be L1 or L2. – pratiklodha Feb 19 '19 at 09:17
  • 1
    thanks but I do know the full forms! I wanted to know their purpose. – A-ar Feb 19 '19 at 16:22