1

I would like to create a unique_id for each line in my dataframe, basing it on the date.

df1:

+---+-----+----+-------+-----+
|day|month|year| userid|units|
+---+-----+------------+-----+
| 01|   01|2016|87cb11 |    0|
| 01|   01|2016|87cb11 |    1|
| 01|   01|2016|87cb11 |    2|
| 02|   01|2016|87cb11 |    0|
| 02|   01|2016|87cb11 |    1|
| 02|   01|2016|87cb11 |    2|
+---+-----+----+-------+-----+

I have tried to use monotonically_increasing_id() but I am unsure of how to create an increasing number which will have a certin lenght.

df2:

+---+-----+----+-------+-----+---------------+
|day|month|year| userid|units| unique_id     |
+---+-----+------------+-----+---------------+
| 01|   01|2016|87cb11 |    0|201601010000001|
| 01|   01|2016|87cb11 |    1|201601010000002|
| 01|   01|2016|87cb11 |    2|201601010000003|
| 02|   01|2016|87cb11 |    0|201601020000001|
| 02|   01|2016|87cb11 |    1|201601020000002|
| 02|   01|2016|87cb11 |    2|201601020000003|
+---+-----+----+-------+-----+---------------+
Data_101
  • 893
  • 7
  • 14
  • 25
  • 3
    use row_number with window function and then concat with date . that should be it – Ramesh Maharjan Oct 01 '18 at 11:36
  • Convert the `row_number` to a string and then concat as Ramesh said. You can use c-style string formatting to zero pad the `row_number` to the appropriate length- see an example [here](https://stackoverflow.com/a/49696839/5858851). – pault Oct 01 '18 at 13:43

0 Answers0