1

I am working on one code where I need to add sequential number as per the grouping on the basis of column A & column B. Below is the table/dataframe I have. The data is sorted by colA & Date.

colA colB Date
A 1 01-01-2014
A 1 01-02-2014
A 3 30-04-2014
A 3 05-05-2014
A 2 25-05-2014
A 1 06-06-2014
A 1 21-07-2014
B 1 04-09-2014
B 1 19-10-2014
B 1 03-12-2014
C 3 17-01-2015
C 2 03-03-2015
C 2 17-04-2015

Following is the expected result:

colA colB Date ROWNUM
A 1 01-01-2014 1
A 1 01-02-2014 2
A 3 30-04-2014 1
A 3 05-05-2014 2
A 2 25-05-2014 1
A 1 06-06-2014 1
A 1 21-07-2014 2
B 1 04-09-2014 1
B 1 19-10-2014 2
B 1 03-12-2014 3
C 3 17-01-2015 1
C 2 03-03-2015 1
C 2 17-04-2015 2

I am trying to use row_number here and getting the below result where for A(colA) the next occurrence of 1(colB) has rownumber added as 3 instead of 1:

colA colB Date ROWNUM
A 1 01-01-2014 1
A 1 01-02-2014 2
A 3 30-04-2014 1
A 3 05-05-2014 2
A 2 25-05-2014 1
A 1 06-06-2014 3
A 1 21-07-2014 4
B 1 04-09-2014 1
B 1 19-10-2014 2
B 1 03-12-2014 3
C 3 17-01-2015 1
C 2 03-03-2015 1
C 2 17-04-2015 2
Ronak Jain
  • 3,073
  • 1
  • 11
  • 17
Minal
  • 13
  • 3

1 Answers1

0

This would work:

w=Window.orderBy(F.asc("colA"), F.asc("Date"))

df\
.withColumn("eq", F.when(F.lag("colB").over(w)==F.col("colB"), 0).otherwise(1))\
.withColumn("groups", F.sum("eq").over(w))\
.withColumn("row_num", F.row_number().over(w.partitionBy(F.col("colA"), F.col("groups"))))\
.show()

Thanks to this answer: Window functions: PARTITION BY one column after ORDER BY another

Input:

+----+----+----------+
|colA|colB|      Date|
+----+----+----------+
|   A|   1|2014-01-01|
|   A|   1|2014-02-01|
|   A|   3|2014-04-30|
|   A|   3|2014-05-05|
|   A|   2|2014-05-25|
|   A|   1|2014-06-06|
|   A|   1|2014-07-21|
|   B|   1|2014-09-04|
|   B|   1|2014-10-19|
|   B|   1|2014-12-03|
|   C|   3|2015-01-17|
|   C|   2|2015-03-03|
|   C|   2|2015-04-17|
+----+----+----------+

Output (Keeping the intermediate columns for understanding, you can drop them):

+----+----+----------+---+------+-------+
|colA|colB|      Date| eq|groups|row_num|
+----+----+----------+---+------+-------+
|   A|   1|2014-01-01|  1|     1|      1|
|   A|   1|2014-02-01|  0|     1|      2|
|   A|   3|2014-04-30|  1|     2|      1|
|   A|   3|2014-05-05|  0|     2|      2|
|   A|   2|2014-05-25|  1|     3|      1|
|   A|   1|2014-06-06|  1|     4|      1|
|   A|   1|2014-07-21|  0|     4|      2|
|   B|   1|2014-09-04|  0|     4|      1|
|   B|   1|2014-10-19|  0|     4|      2|
|   B|   1|2014-12-03|  0|     4|      3|
|   C|   3|2015-01-17|  1|     5|      1|
|   C|   2|2015-03-03|  1|     6|      1|
|   C|   2|2015-04-17|  0|     6|      2|
+----+----+----------+---+------+-------+
Ronak Jain
  • 3,073
  • 1
  • 11
  • 17