1

I do this query in MySQL like this:

SELECT *,@a:=@a+1 AS rn FROM PERSON,(SELECT(@a:=0)) AS A

I want to know how I can do this by spark sql when I query a DataFrame,like this type:

val strSQL=" SELECT *,@a:=@a+1 AS rn FROM PERSON,(SELECT(@a:=0)) AS A"
sqlContext.sql(strSQL)

My problem is that when I run this code ,the console tell me @a:=@a+1 is wrong, I dont knwon how to write the SQL in spark SQL type.

jarlh
  • 42,561
  • 8
  • 45
  • 63
lee
  • 159
  • 1
  • 2
  • 7

1 Answers1

0

If you want to generate row number then you can use the function

val schema = df.schema //data frame schema 

//add rowNumberField
val rows = df.rdd.zipWithUniqueId.map{
   case (r: Row, id: Long) => Row.fromSeq(id +: r.toSeq)}

Create DataFrame with schema 

val dfWithPK = sqlContext.createDataFrame(
  rows, StructType(StructField("rowMunber", LongType, false) +: schema.fields))

You can also use SQL function to generate row number

sqlContext.sql("select row_number() over (order by columnName) as rowNumber from Person")

Or you can use use Using MontotonicallyIncreasingID methodto generate unique row Ids.

df.withColumn("RowNumber",monotonicallyIncreasingId)

Hope this helps!

koiralo
  • 22,594
  • 6
  • 51
  • 72
  • Point to note here, `monotonicallyIncreasingID` generates non-deterministic values in increasing order. Not exactly what the OP wants. – philantrovert May 22 '17 at 10:19