0

I have a large dataset with the following columns

id,
token1,
token2,
token4,
record

Now someone gave me a list of changes to be done to this table.

if token1 = xxx and token2 = yyy and token4 = zzz change record to some another value kkk

like this i have another 100k changes need to be done.

so they have given a csv file as

token1,token2,token4,record
xxxx,yyy,zzzz,kkk..
...
100K changes

Whats the best way to deal in this situation

My plan is

  • create a for loop in python
  • loop through each line in the csv file,
  • check if record exists then change the recordid to the corresponding recordid mentioned.

Can someone suggest whats the best way to do a bulk update with multiple where conditions.

I have to do this task on snowflake

Santhosh
  • 9,965
  • 20
  • 103
  • 243
  • Are the values for token1, token2, etc. Different for each row? Or do you need to update the same values for each row? – Monish B Jul 28 '23 at 05:20
  • 1
    This question is tagged as sql but does not mention which database platform - sqlserver, mysql or oracle? If you are supposed to use python code to update then you should use python tag as well. Also you can try posting on https://dba.stackexchange.com if you are interested about database performance / pagelock –  Jul 28 '23 at 05:26
  • If using Java is an option, then [JDBC](https://docs.oracle.com/javase/tutorial/jdbc/index.html) supports [batch](https://stackoverflow.com/questions/3784197/efficient-way-to-do-batch-inserts-with-jdbc) operations. There are also third-party libraries for handling CSV files such as [OpenCSV](https://www.baeldung.com/opencsv). – Abra Jul 28 '23 at 05:39
  • 1
    It's going to take a long time to loop through that many records using any method. https://sqlstudies.com/2016/08/17/rbar-vs-batch/ It will be much quicker to import the changes to a local table in the database then run one `update` query. – Nick.Mc Jul 28 '23 at 06:43
  • https://meta.stackoverflow.com/questions/388759/why-should-i-tag-my-rdbms – Reporter Jul 28 '23 at 08:06
  • @MonishB the token1, token2 values are different for each row. The values to be updated for each row will be different. – Santhosh Jul 30 '23 at 03:01
  • @Rakib its snowflake related – Santhosh Jul 30 '23 at 03:36
  • Then I think you'll need to use it with cases! Not sure though. @Santhosh – Monish B Jul 31 '23 at 18:20

0 Answers0