13

I'm working on a spark mllib algorithm. The dataset I have is in this form

Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(there are more values similar to these)

Im trying to raw code String values to Numeric values. So, I tried using zipwithuniqueID for unique value for each of the string values.For some reason I'm not able to save the modified dataset to the disk. Can I do this in any way using spark SQL? or what would be the better approach for this?

ZygD
  • 22,092
  • 39
  • 79
  • 102
KM-Yash
  • 133
  • 1
  • 1
  • 6

1 Answers1

20

Scala

import org.apache.spark.sql.functions.monotonically_increasing_id
val dataFrame1 = dataFrame0.withColumn("index",monotonically_increasing_id())

Java

 Import org.apache.spark.sql.functions;
Dataset<Row> dataFrame1 = dataFrame0.withColumn("index",functions.monotonically_increasing_id());
Community
  • 1
  • 1
Yugerten
  • 878
  • 1
  • 11
  • 30