Not in Java as I do not specialize in that, but in Scala. Should be easy enough to convert for you. Just an example I have using DS with case classes:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.{Encoder, Encoders}
import spark.implicits._
// Gen some example data via DF, can come from files, ordering in those files assumed. I.e. no need to sort.
val df = Seq(
("1 February"), ("n"), ("c"), ("b"),
("2 February"), ("hh"), ("www"), ("e"),
("3 February"), ("y"), ("s"), ("j"),
("1 March"), ("c"), ("b"), ("x"),
("1 March"), ("c"), ("b"), ("x"),
("2 March"), ("c"), ("b"), ("x"),
("3 March"), ("c"), ("b"), ("x"), ("y"), ("z")
).toDF("line")
// Define Case Classes to avoid Row aspects on df --> rdd --> to DF.
case class X(line: String)
case class Xtra(key: Long, line: String)
// Add the Seq Num using zipWithIndex. Then convert back, but will have a struct to deal wit.
// You can avoid the struct if using Row and such. But general idea should be clear.
val rdd = df.as[X].rdd.zipWithIndex().map{case (v,k) => (k,v)}
val ds = rdd.toDF("key", "line").as[Xtra]
ds.show(100,false)
returns:
+---+------------+
|key|line |
+---+------------+
|0 |[1 February]|
|1 |[n] |
|2 |[c] |
...
The answers to-date do not meet the needs as question provide but if only 10K rows then the single partition is not an issue. Although for 10K rows one has to ask a few questions.
If you don't mind Row, here is another approach:
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructField,StructType,IntegerType, ArrayType, LongType}
val df = sc.parallelize(Seq((1.0, 2.0), (0.0, -1.0), (3.0, 4.0), (6.0, -2.3))).toDF("x", "y")
val newSchema = StructType(df.schema.fields ++ Array(StructField("rowid", LongType, false)))
val rddWithId = df.rdd.zipWithIndex
val dfZippedWithId = spark.createDataFrame(rddWithId.map{ case (row, index) => Row.fromSeq(row.toSeq ++ Array(index))}, newSchema)