0

I want to Create a column named "id" which will have row numbers to be used in final pair generation. "id" column

I did it in python using below way. can anyone suggest how to do it in pyspark.

con_2['id'] = range(1, 1+len(con_2))
len(con_2.customer_play_id.unique()) 

My Pyspark code is below, but its not working

from pyspark.sql.types import IntegerType
slen = udf(lambda s: len(s), IntegerType())
con_2 = con_2.withColumn('id', F.length(con_2.customer_play_id))

expected output should be (Id is the column i want to add) df

id  col1 col2
1   X      Y
2   y1     y4
3   y2     y7
4   y3     y8
Tilo
  • 409
  • 1
  • 5
  • 14
  • Possible duplicate of [Pyspark add sequential and deterministic index to dataframe](https://stackoverflow.com/questions/52318016/pyspark-add-sequential-and-deterministic-index-to-dataframe) – pault May 21 '19 at 14:05

1 Answers1

-1
from pyspark.sql.window import Window as W
from pyspark.sql import functions as 
con_2 =con_2.withColumn("id",row_number().over(Window.orderBy("customer_play_id")))
con_2.show()
Tilo
  • 409
  • 1
  • 5
  • 14