0

I have a glue script using pyspark. I have to create a unique surrogate key. Ive been using row_number and monatically increasing, and it works on the first job but every time I upload new files and run the job again it starts the number back at 1. Any guidance on how I can keep the sequential continuing when new files are added? For more info I'm uploading to an Oracle Database.

Current table

Column A Column B
Jon 1
doe 2

Table after a upload

Column A Column B
Jon 1
doe 2
Jean 1

What I want after a upload

Column A Column B
Jon 1
doe 2
Jean 3
clevermrj
  • 1
  • 1
  • Unless there is any hidden context to the question (in which case you need to [edit] the question to include a [MRE] with an example of the code you are using) then the answer is to use a `SEQUENCE` or an `IDENTITY` column within the database (and not trying to manage this from your script). – MT0 Nov 17 '22 at 15:10
  • That was an option but with that route I run into roadblocks of having to match the id to certain files to populate more tables. – clevermrj Nov 17 '22 at 15:14
  • See my previous comment regarding hidden context; you have not explained any of that in the question and if you do not explain why you should not do it a certain way (that is otherwise best-practice) then we will suggest doing it that way (and this is a duplicate question). If you don't think it is a duplicate question then [edit] the question to explain why. – MT0 Nov 17 '22 at 15:17
  • this exactly my scenario right here https://stackoverflow.com/q/59399926/19095452 – clevermrj Nov 17 '22 at 20:04

0 Answers0