0

I'm having a hard time explaining this through writing, so please be patient.

I'm making this project in which I have to choose a month and a year to know all the active employees during that month of the year.. but in my database I'm storing the dates when they started and when they finished in dd/mm/yyyy format.

So if I have an employee who worked for 4 months eg. from 01/01/2013 to 01/05/2013 I'll have him in four months. I'd need to make him appear 4 tables(one for every active month) with the other employees that are active during those months. In this case those will be: January, February, March and April of 2013.

The problem is I have no idea how to make a query here or php processing to achieve this.

All I can think is something like (I'd run this query for every month, passing the year and month as argument)

pg_query= "SELECT employee_name FROM employees
           WHERE month_and_year between start_date AND finish_date"

But that can't be done, mainly because month_and_year must be a column not a variable.
Ideas anyone?

UPDATE

Yes, I'm very sorry that I forgot to say I was using DATE as data type.

The easiest solution I found was to use EXTRACT

select * from employees where extract (year FROM start_date)>='2013'
AND extract (month FROM start_date)='06' AND extract (month FROM finish_date)<='07'

This gives me all records from june of 2013 you sure can substite the literal variables for any variable of your preference

Jesus Rodriguez
  • 2,571
  • 2
  • 22
  • 38
  • Please confirm the datatype of your start and end date fields. Are they date or char? – Dan Bracuk Jul 04 '13 at 18:48
  • More generally, please provide your version of Postgres and the table definition. – Erwin Brandstetter Jul 04 '13 at 20:19
  • 1
    The solution you present will be very slow with big tables since the `WHERE` conditions are **[not sargable](http://en.wikipedia.org/wiki/Sargable)**. I would consider either my or Clodoaldo's answer. (Not Gordon's, which also fails in this respect.) Test with [`EXPLAIN ANALYZE`](http://www.postgresql.org/docs/current/interactive/sql-explain.html) to see for yourself. – Erwin Brandstetter Jul 08 '13 at 19:31
  • I agree. my solution is far from optimal that is why I marked Clodoaldo's as correct – Jesus Rodriguez Jul 08 '13 at 21:32

3 Answers3

2

Assuming that you really are not storing dates as character strings, but are only outputting them that way, then you can do:

SELECT employee_name
FROM employees
WHERE start_date <= <last date of month> and
      (finish_date >= <first date of month> or finish_date is null)

If you are storing them in this format, then you can do some fiddling with years and months.
This version turns the "dates" into strings of the form "YYYYMM". Just express the month you want like this and you can do the comparison:

select employee_name
from employees e
where right(start_date, 4)||substr(start_date, 4, 2) <= 'YYYYMM' and
      (right(finish_date, 4)||substr(finish_date, 4, 2) >= 'YYYYMM' or finish_date is null)

NOTE: the expression 'YYYYMM' is meant to be the month/year you are looking for.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

First, you can generate multiple date intervals easily with generate_series(). To get lower and upper bound add an interval of 1 month to the start:

SELECT g::date                       AS d_lower
    , (g + interval '1 month')::date AS d_upper
FROM  generate_series('2013-01-01'::date, '2013-04-01', '1 month') g;

Produces:

  d_lower   |  d_upper
------------+------------
 2013-01-01 | 2013-02-01
 2013-02-01 | 2013-03-01
 2013-03-01 | 2013-04-01
 2013-04-01 | 2013-05-01

The upper border of the time range is the first of the next month. This is on purpose, since we are going to use the standard SQL OVERLAPS operator further down. Quoting the manual at said location:

Each time period is considered to represent the half-open interval start <= time < end [...]

Next, you use a LEFT [OUTER] JOIN to connect employees to these date ranges:

SELECT to_char(m.d_lower, 'YYYY-MM') AS month_and_year, e.*
FROM  (
   SELECT g::date                       AS d_lower
       , (g + interval '1 month')::date AS d_upper
   FROM   generate_series('2013-01-01'::date, '2013-04-01', '1 month') g
   ) m
LEFT   JOIN employees e ON (m.d_lower, m.d_upper)
                  OVERLAPS (e.start_date, COALESCE(e.finish_date, 'infinity'))
ORDER  BY 1;
  • The LEFT JOIN includes date ranges even if no matching employees are found.

  • Use COALESCE(e.finish_date, 'infinity')) for employees without a finish_date. They are considered to be still employed. Or maybe use current_date in place of infinity.

  • Use to_char() to get a nicely formatted month_and_year value.

  • You can easily select any columns you need from employees. In my example I take all columns with e.*.

  • The 1 in ORDER BY 1 is a positional parameter to simplify the code. Orders by the first column month_and_year.

  • To make this fast, create an multi-column index on these expressions. Like

    CREATE INDEX employees_start_finish_idx
    ON employees (start_date, COALESCE(finish_date, 'infinity') DESC);
    

    Note the descending order on the second index-column.

  • If you should have committed the folly of storing temporal data as string types (text or varchar) with the pattern 'DD/MM/YYYY' instead of date or timestamp or timestamptz, convert the string to date with to_date(). Example:

    SELECT to_date('01/03/2013'::text, 'DD/MM/YYYY')
    

    Change the last line of the query to:

    ...
    OVERLAPS (to_date(e.start_date, 'DD/MM/YYYY')
             ,COALESCE(to_date(e.finish_date, 'DD/MM/YYYY'), 'infinity'))
    

    You can even have a functional index like that. But really, you should use a date or timestamp column.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Very comprehensive! A couple of niggles with the sample query: using single-letter table aliases (what does `m` stand for?) and omitting the `AS` keyword makes it harder for non-experts to follow; some extra line breaks wouldn't hurt, either. I'm also not a fan of `ORDER BY 1` for clarity (it looks at first glance like you're ordering by a constant!), but I know that's more a matter of opinion. – IMSoP Jul 04 '13 at 21:06
  • @IMSoP: Well, I have reasons for each of these syntax decisions, but the answer is long enough as it is. I prefer short table aliases for clarity (m is for month) and I tend to use positional parameters instead of repeating longer expressions. I always include the keyword `AS` for column aliases (as recommended in the manual) but trim the noise word for table aliases. I see your point arguing a more verbose syntax style but, as you said, that's a matter of opinion or taste and style. I much prefer less noise for clarity. – Erwin Brandstetter Jul 04 '13 at 21:16
  • Agree with you on "matter of opinion" for everything except table aliases. Aliases are like variable names, and just as `$m` is a bad name for a PHP variable, `m` is a bad alias for a derived table. `e` for `employees` is not as bad, although in a complex query like that I'd be tempted to go with `emp` for the extra clarity (rather than just *brevity*). – IMSoP Jul 04 '13 at 21:21
  • @IMSoP: For the limited scope of a single query (as opposed to a generally much longer PHP function) I prefer short table aliases. If the beast is bigger (including SQL queries), more descriptive identifiers are in order ... – Erwin Brandstetter Jul 04 '13 at 21:24
2

There is no need to create a range to make an overlap:

select to_char(d, 'YYYY-MM') as "Month", e.name
from
    (
        select generate_series(
            '2013-01-01'::date, '2013-05-01', '1 month'
        )::date
    ) s(d)
    inner join
    employee e on
        date_trunc('month', e.start_date)::date <= s.d
        and coalesce(e.finish_date, 'infinity') > s.d
order by 1, 2

SQL Fiddle

If you want the months with no active employees to show then change the inner for a left join


Erwin, about your comment:

the second expression would have to be coalesce(e.finish_date, 'infinity') >= s.d

Notice the requirement:

So if I have an employee who worked for 4 months eg. from 01/01/2013 to 01/05/2013 I'll have him in four months

From that I understand that the last active day is indeed the previous day from finish.

If I use your "fix" I will include employee f in month 05 from my example. He finished in 2013-05-01:

('f', '2013-04-17', '2013-05-01'),

SQL Fiddle with your fix

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • A range & `OVERLAPS` it is the more elegant solution and performs well in combination with the suggested index - as opposed to this query which would need a more specialized index with *two* functional expressions. It is also incorrect: the second expression would have to be `coalesce(e.finish_date, 'infinity') ` **`>=`** `s.d`. Better yet, make that: `ON e.start_date < s.d + interval '1 month' AND coalesce(e.finish_date, 'infinity') >= s.d`. That would be the other good solution [like outlined in this related answer](http://stackoverflow.com/a/15305292/939860). – Erwin Brandstetter Jul 05 '13 at 01:34
  • @Erwin I don't think more complex, as in creating a range to use an overlap operator, is more elegant. Indeed I only see elegance in simplicity. Your single functional expression index is better but not because it is less specialized, as you already lost generality with one only functional expression, but because the functional expression is the second term, meaning it lost only some generality as the first term will still be generally useful. And yes, and here it applies, it is simpler and so more elegant. – Clodoaldo Neto Jul 05 '13 at 16:40
  • OK, I see. `>` or `>=` really depends on the *meaning* of the underlying data: whether the last day of the range is supposed to be included or excluded. `OVERLAPS` includes the lower border by default and excludes the upper. – Erwin Brandstetter Jul 05 '13 at 17:18