I'm searching the way to add a column 'id' to my dataframe (dfProc) with sequencial numbers from 1 (or zero) to number of rows (in this example it has 10 rows but my df has variable rows).
The content of my dfProc:
+-----+-------+------------+
|op_id|op_name|op_procedure|
+-----+-------+------------+
| 90| 39| 4|
| 91| 39| 5|
| 98| 39| 8|
| 111| 39| 11|
| 113| 39| 13|
| 104| 39| 14|
| 94| 39| 15|
| 96| 39| 17|
| 97| 39| 18|
| 93| 39| 21|
+-----+-------+------------+
The final result that I want is:
+-----+-------+------------+---+
|op_id|op_name|op_procedure|id |
+-----+-------+------------+---+
| 90| 39| 4| 1|
| 91| 39| 5| 2|
| 98| 39| 8| 3|
| 111| 39| 11| 4|
| 113| 39| 13| 5|
| 104| 39| 14| 6|
| 94| 39| 15| 7|
| 96| 39| 17| 8|
| 97| 39| 18| 9|
| 93| 39| 21| 10|
+-----+-------+------------+---+
Note: I'm using pyspark 1.5.2. and i can't update to another version.