Add column with row number in increasing order starting from 1 using pyspark

Question

I want to Create a column named "id" which will have row numbers to be used in final pair generation. "id" column

I did it in python using below way. can anyone suggest how to do it in pyspark.

con_2['id'] = range(1, 1+len(con_2))
len(con_2.customer_play_id.unique())

My Pyspark code is below, but its not working

from pyspark.sql.types import IntegerType
slen = udf(lambda s: len(s), IntegerType())
con_2 = con_2.withColumn('id', F.length(con_2.customer_play_id))

expected output should be (Id is the column i want to add) df

id  col1 col2
1   X      Y
2   y1     y4
3   y2     y7
4   y3     y8

Possible duplicate of [Pyspark add sequential and deterministic index to dataframe](https://stackoverflow.com/questions/52318016/pyspark-add-sequential-and-deterministic-index-to-dataframe) — pault, May 21 '19 at 14:05

score -1 · Answer 1 · answered May 21 '19 at 11:57

-1

from pyspark.sql.window import Window as W
from pyspark.sql import functions as 
con_2 =con_2.withColumn("id",row_number().over(Window.orderBy("customer_play_id")))
con_2.show()

answered May 21 '19 at 11:57

Tilo

409
1
5
14

Add column with row number in increasing order starting from 1 using pyspark

1 Answers1