How to add an id column to a variable rows dataframe in pyspark

Question

I'm searching the way to add a column 'id' to my dataframe (dfProc) with sequencial numbers from 1 (or zero) to number of rows (in this example it has 10 rows but my df has variable rows).

The content of my dfProc:

 +-----+-------+------------+
 |op_id|op_name|op_procedure|
 +-----+-------+------------+
 |   90|     39|           4|
 |   91|     39|           5|
 |   98|     39|           8|
 |  111|     39|          11|
 |  113|     39|          13|
 |  104|     39|          14|
 |   94|     39|          15|
 |   96|     39|          17|
 |   97|     39|          18| 
 |   93|     39|          21|
 +-----+-------+------------+

The final result that I want is:

 +-----+-------+------------+---+
 |op_id|op_name|op_procedure|id |
 +-----+-------+------------+---+
 |   90|     39|           4|  1|
 |   91|     39|           5|  2|
 |   98|     39|           8|  3|
 |  111|     39|          11|  4|
 |  113|     39|          13|  5|
 |  104|     39|          14|  6|
 |   94|     39|          15|  7|
 |   96|     39|          17|  8|
 |   97|     39|          18|  9|
 |   93|     39|          21| 10|
 +-----+-------+------------+---+

Note: I'm using pyspark 1.5.2. and i can't update to another version.

How to add an id column to a variable rows dataframe in pyspark

0 Answers0