1

I have been given a table that has over 2 Billion rows in it. It has a field for time entered but the creator of the table made this field a string field. It also does not follow normal date-time conventions.

I have been tasked to create a new field that is a datetime type field containing the same time but converted to be proper format so that queries can be run on it.

Currently I have written a C# console application that SELECTS the top 100000 rows that have not been updated yet and row by row converts the string to a time. It then updates each of the rows.

This process works but it is slow and time is of the essence. I can run multiple copies of my program and am trying to come up with a solution to somehow run the program multiple times and make sure each copy of the program is updating different rows.

Ideas so far:

  • instead of selecting top 100000 rows, select 1000000 random rows. (there may be some overlap but it would get the job done)
  • this table has an id field. I could do a select where id modulo 2 == 0 for one program, id % 2 != 0 for another (this could continue for prime numbers etc.)
  • get rows only within an id range that is specified for each copy of the program I have running
  • add a locked column to tell my program that a field is currently locked (which would represent that it is being processed)

Can anyone provide a better suggestion? Improve on one of mine?

Thanks.

UPDATE: an example of my existing time string is 12/Nov/2014:08:52:22 and it needs to be converted to 2014-11-12 08:42:22

Scott
  • 15
  • 1
  • 4
  • 3
    Why don't you do use SQL Server to do the conversion instead of going back and forth between the database and console app? – Jeff Ogata Nov 06 '14 at 13:53
  • Does your data follow the same format? You should try to use the SQL server for this: http://stackoverflow.com/questions/1509977/convert-varchar-into-datetime-in-sql-server – AlexanderBrevig Nov 06 '14 at 13:54
  • I agree, just run it in management studio & have a coffee while you wait. – Alex K. Nov 06 '14 at 13:55
  • But, Is there any possibility for timeout or directly running in sql server slows the server's other processes. – Veera Nov 06 '14 at 13:56
  • If you must do it in a console app, take a look at LINQ to SQL and the Skip() and Take() extension methods available. – thiag0 Nov 06 '14 at 13:57
  • Assuming that this is a one time conversion I would use a TSQL script to loop through selecting, say, 1000 rows that haven't been updated and updating that block of rows. Loop until the `SELECT` doesn't return any target rows. That should reduce the impact on other processes. (A `WAIT` can be thrown in the loop to spread the load out over time.) If other processes are updating the source column then you have a somewhat more complex problem to deal with. – HABO Nov 06 '14 at 14:02
  • If you really have to use app, then select top 100000 and save it. Next time you run your program select next 100000. – Dino Velić Nov 06 '14 at 14:17

3 Answers3

2

Use LINQ to SQL and only pull a certain amount of records at a time.

Your code would look something along these lines:

using (var db = new MyDbContext()) 
{
    var results = db.GetResultsFromDatabase();
    int take = 100;
    int processed = 0;

    while(processed < results.Count()) 
    {
        var set = results.Skip(processed).Take(take);
        set.ForEach(s => {  
            // update the date
        }

        processed += take;
    }

    db.SubmitChanges();
}
thiag0
  • 2,199
  • 5
  • 31
  • 51
  • This is essentially what I am already doing. I need something to be able to run the same thing at the same time but process different records. – Scott Nov 06 '14 at 14:22
1

Just do it inside SQL instead of reading the entire database and working it.

use the following :

    UPDATE <TableName>
    SET <dateTime-Date> = CONVERT(datetime, <string-Date>)

check this site for any format specifications you would like : SQL Datetime Conversion

Noxymon
  • 201
  • 4
  • 15
  • Based on my current Date string I do not believe I can simply convert it. – Scott Nov 06 '14 at 14:13
  • can you give an example of the date string ? we might be able to help or at least tell you if its convertible. – Noxymon Nov 06 '14 at 14:19
  • Updated the first post. Current string is 12/Nov/2014:08:52:22 and I'd like it to be 2014-11-12 08:42:22 – Scott Nov 06 '14 at 14:20
  • This is a really uncommon way for saving dates. Anyhow, i recommend first changing all the NOV,DEC,JAN month names into numeric values first, and then active the CONVERT function. You can use Switch case to determine which no to relate to which month. upon completion your date will be in a valid format. 12/11/2014 08:52:22, and then you can use the convert method. – Noxymon Nov 06 '14 at 15:03
1

So here is my solution, I have no idea how efective is for 2mil rows, in case you want to do it in SQL.

Declare @string varchar(50) = '12/Nov/2014:08:52:22'
SELECT CONVERT(datetime,REPLACE(SUBSTRING(@string,0,CHARINDEX ( ':' ,@string , 0)),'/',' ')
 +' '+
SUBSTRING(@string,CHARINDEX ( ':' ,@string , 0)+1,LEN(@string)),120)

Let me explain you a bit the code

REPLACE(SUBSTRING(@string,0,CHARINDEX ( ':' ,@string , 0)),'/',' ') 

replaces the "/" chars from the date and returns 12 Nov 2014

 SUBSTRING(@string,CHARINDEX ( ':' ,@string , 0)+1,LEN(@string)),120)

get the time without the ":" between the date and time from your initial string.

Finally added a space between this two string manipulations and converted it to datetime.

Output for the query above : '2014-11-12 08:52:22.000'

CiucaS
  • 2,010
  • 5
  • 36
  • 63