1

I have to retrieve all the records from a database that have been added since the last execution, this should happen daily.

The only thing that can identify those records from the rest is a Unix Timestamp (in milliseconds) or a Time (hhmmss) and a Date (yyyyMMdd). My problem is that all these columns are of type varchar.

The database is very big, and only getting bigger. Is there any way of getting only the rows with a Unix Timestamp higher than X without having to load the entire thing and parsing the timestamp?

What I do now is:

var sales = context.SALES.Select(s =>
  new Sale {
   Product = s.SC_PRDCT,
    Terminal = s.SC_TERM,
    Operator = s.MC_OP,
    UnixString = s.SC_TIMESTAMP
  })
 .ToList()
 .Where(m => terminals.ContainsKey(m.Terminal) && m.UnixTime > lastExecution);
        public string UnixString
        {
            get { return unixString; }
            set { unixString = value; UnixTime = long.Parse(value); }
        }
Spirotrack
  • 77
  • 1
  • 8
  • Why the tag? (Should it have been perhaps?) – jarlh Nov 28 '19 at 20:42
  • Let me guess, you can't change the database, can you? Which EF version? – Gert Arnold Nov 28 '19 at 21:11
  • No, I cannot change it. It's EF6. – Spirotrack Nov 28 '19 at 21:25
  • And do these unix timestamp happen to have leading zeros to make them equal-length? Or otherwise, could that be the least you'd be allowed to change? – Gert Arnold Nov 28 '19 at 21:32
  • Yes, they are all equal-length, since it only goes back 10 years or so. – Spirotrack Nov 28 '19 at 21:43
  • So if a string-compare will always give the same result as the long-compare, why you are not using the string.Compare ? In other cases you would have to pad-left with 0 the strings to have equal length, but this is already done here. – Holger Dec 05 '19 at 10:48
  • @Holger It didn't cross my mind at the time, but after doing it I found it much slower. If I put the where first (so the server is the one processing the query) it takes minutes to finish. If I ask for everything with a select and create objects to deal with all the data myself I might as well parse the string and perform the comparison on numbers, since comparing numbers is much faster than comparing strings, even if you take the parse into account. – Spirotrack Dec 06 '19 at 10:27

2 Answers2

1

Options that come to mind: If you have the ability to alter the schema while preserving the current fields I would consider adding a computed column to the database for a DateTime equivalent to the timestamp. Barring that, using a View to source the data for this type of search/report which can provide the translated timestamp.

If you don't have the ability to adjust the schema, then things will get a bit trickier. When you say the timestamp can be milliseconds or datetime in a string, does that mean the value can be something like either "1435234353353" (ms since Date X) or "20190827151530" for 2019-08-27 3:15:30 PM? If that is the case, as long as the length of the 2 strings, however formatted, is different then you can potentially still query against the field, it just won't be ideal:

Assuming the date option formatting is "YYYYMMDDHHMMSS":

string lastExecutionDate = targetDate.ToString("yyyyMMddHHmmss");
string lastExecutionMs = targetDate.ToUniversalTime().Subtract(
    new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc)
    ).TotalMilliseconds.ToString();

var terminalKeys = terminals.Keys.ToList(); // You probably cannot use ContainsKey in Linq2EF...
var sales = context.SALES
    .Where(s => terminalKeys.Contains(s.SC_TERM)
       && ((s.SC_TIMESTAMP.Length == 14 && s.SC_TIMESTAMP.CompareTo(lastExecutionDate) > 0)
          || (s.SC_TIMESTAMP.Length != 14 && s.SC_TIMESTAMP.CompareTo(lastExecutionMs) > 0 )))
    .Select(s =>
       new Sale 
       {
           Product = s.SC_PRDCT,
           Terminal = s.SC_TERM,
           Operator = s.MC_OP,
           UnixString = s.SC_TIMESTAMP
       }).ToList();

If the SC_TIMESTAMP column only stores the timestamps in ms, and the time/date are in separate columns, then you don't need the conditional, just format your target datetime to a timestamp string (ms since 1970-01-01) and use that.

string lastExecutionMs = targetDate.ToUniversalTime().Subtract(
    new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc)
    ).TotalMilliseconds.ToString();

var terminalKeys = terminals.Keys.ToList(); // You probably cannot use ContainsKey in Linq2EF...
var sales = context.SALES
    .Where(s => terminalKeys.Contains(s.SC_TERM)
       && s.SC_TIMESTAMP.CompareTo(lastExecutionMs) > 0)
    .Select(s =>
       new Sale 
       {
           Product = s.SC_PRDCT,
           Terminal = s.SC_TERM,
           Operator = s.MC_OP,
           UnixString = s.SC_TIMESTAMP
       }).ToList();

The caveat for this to work with the ms or datetime in the same field is that if you require timestamp or datetime is that the datetime string must be an ISO format "year Month Day Time24" which is sortable, otherwise it cannot be used in a comparison.

Steve Py
  • 26,149
  • 3
  • 25
  • 43
  • I cannot alter the schema. The time/date are in separate columns. This looks good on paper, but `.Where(s => terminalKeys.Contains(s.SC_TERM) && s.SC_TIMESTAMP > lastExecutionMs)` won't compile because you cannot use the `>` operator on strings. – Spirotrack Nov 29 '19 at 07:44
  • Ah, Yes.. I should have tried that on my dev environment.. .`CompareTo` should work. I will update the answer. – Steve Py Nov 29 '19 at 21:46
1

The lucky fact that the the unix timestamps are all equal in length, thus sortable, you can query them by the > operator in SQL.

Of course, as you' may have tried, m.SC_TIMESTAMP > lastExecution doesn't compile in C# but fortunately, EF6 (and EF3, more or less) translate the following predicate into the desired SQL predicate:

Where(m => m.SC_TIMESTAMP.CompareTo(lastExecution) > 0)

where lastExecution is a string.

Remember to add a comment to your code that this works until 2286-11-20 17:46:39, when the UNIX timestamp is 9999999999999. After that, your successors should use a more generic method that takes length into account.

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
  • `m.UnixTime > lastExecution` compiles because at that point I'm comparing a long (parsed from `UnixString` that I get from `SC_TIMESTAMP` when I initialize the `Sale` object). I think what you meant is the following: `Where(m => m.SC_TIMESTAMP.CompareTo(lastExecution) > 0)` but that gives an `Unable to create a constant value of type 'System.Object'. Only primitive types or enumeration types are supported in this context.` exception. – Spirotrack Nov 29 '19 at 07:33