0

I am working on a project: financial analysis to predict stock outcomes. And I am using pyspark to do it.

df2=df.rdd.map(lambda x:(x[0], DenseVector(x[1:])))

I ran this command to create a dataframe from another dataframe that I have to do scaling and, later to create a regression model. I get an enormous bunch of errors saying

cannot convert String to Float

Please help !

Linus
  • 705
  • 1
  • 10
  • 20
Adhithya JD
  • 11
  • 1
  • 3
  • 1
    Please provide [a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve)!!! – Mr. T Feb 18 '18 at 18:04
  • The constructor for [`DenseVector`](http://spark.apache.org/docs/2.0.0/api/python/_modules/pyspark/mllib/linalg.html#DenseVector) takes in an array and tries to convert its contents to `np.float64`. If you're seeing this error, you almost certainly have strings (that can not be converted to numeric) in your data. – pault Feb 18 '18 at 18:23
  • In order for people to help you, we're going to need to be able to reproduce your issue. See this post on [how to create good reproducible spark examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). – pault Feb 18 '18 at 18:25

0 Answers0