I have been given a table that has over 2 Billion rows in it. It has a field for time entered but the creator of the table made this field a string field. It also does not follow normal date-time conventions.
I have been tasked to create a new field that is a datetime type field containing the same time but converted to be proper format so that queries can be run on it.
Currently I have written a C# console application that SELECTS the top 100000 rows that have not been updated yet and row by row converts the string to a time. It then updates each of the rows.
This process works but it is slow and time is of the essence. I can run multiple copies of my program and am trying to come up with a solution to somehow run the program multiple times and make sure each copy of the program is updating different rows.
Ideas so far:
- instead of selecting top 100000 rows, select 1000000 random rows. (there may be some overlap but it would get the job done)
- this table has an id field. I could do a select where id modulo 2 == 0 for one program, id % 2 != 0 for another (this could continue for prime numbers etc.)
- get rows only within an id range that is specified for each copy of the program I have running
- add a locked column to tell my program that a field is currently locked (which would represent that it is being processed)
Can anyone provide a better suggestion? Improve on one of mine?
Thanks.
UPDATE: an example of my existing time string is 12/Nov/2014:08:52:22 and it needs to be converted to 2014-11-12 08:42:22