3

I have a dataset with a data that is formatted like this:

Date        | exec_time
------------+---------
Today       | 99999 ms
Yesterday   | 1 ms
Tomorrow    | 50000 ms
Another Day | None Recorded
Last Day |  ms

What I need to do is write a query to get all of the exec_time values that are >= 60000

The way I've tried to write it is like this:

select exec_time 
from myTable
where exec_time not like '%N%'
and cast(split_part(exec_time,' ', 1) as int) >= 60000
order by len(exec_time) desc, exec_time desc
limit 10

However, when I run this, I get this error:

ERROR: Invalid digit, Value '2', Pos 0, Type: Integer 
  Detail: 
  -----------------------------------------------
  error:  Invalid digit, Value '2', Pos 0, Type: Integer 
  code:      1207
  context:   
  query:     2780081
  location:  :0
  process:   query0_61 [pid=0]
  -----------------------------------------------

Any ideas how I can get around this?

johncorser
  • 9,262
  • 17
  • 57
  • 102
  • I think the problem may be the `None Recorded` value. [Have a look at this answer](http://stackoverflow.com/questions/4626346/how-to-cast-from-text-to-int-if-column-contain-both-int-and-null-values-in-postg), it may help you with your problem. – waka Nov 21 '14 at 19:25
  • 1
    I changed `len()` to `length()` and, using version `9.3.1`, it worked. See http://sqlfiddle.com/#!15/5fcd5/2 – PM 77-1 Nov 21 '14 at 19:34
  • 1
    Can you change the schema and data to be more sensible? The `exec_time` should probably be a number in milliseconds and you should use `NULL` for that "none recorded". – mu is too short Nov 21 '14 at 19:43
  • It does work in the sqlfiddle, but on my machine (against the full data set) I still get the invalid digit, Value '2' – johncorser Nov 21 '14 at 19:53

2 Answers2

2

The error: WHERE conditions are not executed in any given order.
Use a CASE statement to avoid the exception.

SELECT exec_time 
FROM   myTable
WHERE  CASE WHEN exec_time NOT LIKE '%N%' THEN
         split_part(exec_time,' ', 1)::int >= 60000
       ELSE FALSE END
ORDER  BY length(exec_time) desc, exec_time desc
LIMIT  10;

While being at it, if 'None Recorded' is the only case to rule out, use a faster left-anchored check:

exec_time NOT LIKE 'N%'

If the above still errors out, check with this to find any offending rows you may have missed:

SELECT DISTINCT exec_time
FROM   myTable
WHERE  exec_time NOT LIKE '%N%'
AND    exec_time !~ '^\\d+ '  -- not all digits before the first space

In modern Postgres you only need a single backslash. '^\d+ '! Seems you have to double up on backslashes in Redshift, which seems to still use the outdated Posix escape syntax for strings by default, and without explicit declaration (E'^\\d+ ')!

Generally, it's not a good idea to mix data this way. You should have an integer column to store execution time. Much cheaper, cleaner and faster.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • @johncorser: I added a bit to debug this. – Erwin Brandstetter Nov 21 '14 at 19:54
  • The debug code yielded thousands of results, all of them look something like `52 ms` but with different numbers. Why is `exec_time !~ '^\d+ '` not taking care of those? – johncorser Nov 21 '14 at 19:57
  • @johncorser: This makes no sense ([fiddle](http://sqlfiddle.com/#!15/d41d8/3978)) - unless we are dealing with different kinds of white space, like tab characters ... check for `\t` in a regexp match or leading white space. – Erwin Brandstetter Nov 21 '14 at 20:04
  • Or maybe Redshift is still using the outdated [Posix escape syntax](http://stackoverflow.com/questions/12316953/insert-varchar-with-single-quotes-in-postgresql/12320729#12320729) for strings by default. Redshift is a weird mix of very old code and new additions. It's certainly not Postgres. Updated my check. – Erwin Brandstetter Nov 21 '14 at 20:14
  • `ERROR: type "e" does not exist` Looks like that wasn't it – johncorser Nov 21 '14 at 20:23
  • Ah- I found it by removing that E. there is also a data point that looks like this ` ms`. – johncorser Nov 21 '14 at 20:24
  • 1
    @johncorser: I added an explanation for that. – Erwin Brandstetter Nov 21 '14 at 20:39
1

I think the problem is the "None Recorded" value. I don't know if it is going to run the first where to exclude the first or not. Try this:

SELECT exec_time
FROM (SELECT exec_time FROM myTable WHERE exec_time NOT LIKE 'N%') as foo
WHERE cast(split_part(foo.exec_time, ' ', 1) as int) >= 60000
ORDER by length(foo.exec_time) desc, foo.exec_time desc
limit 10
Joe
  • 335
  • 1
  • 9
  • Interesting, it seems to work for me on my postgresql. What version of postgresql are you using? – Joe Nov 21 '14 at 20:03
  • 1
    @Joe: Your answer *should* work fine. I suspect a problem with whitespace characters not being spaces or leading white space ... – Erwin Brandstetter Nov 21 '14 at 20:06