0

I have a query which exports results to a text file. The code is below:

Copy (
      Select
           foo.gid As addr_ID, 
           bar.geom As streets,
           St_LineInterpolatePoint
               (ST_LineMerge(bar.geom),
                  St_LineLocatePoint
                     (St_LineMerge(bar.geom),foo.geom)
           ) As interpolated_point
      FROM foo
      Left JOIN bar ON ST_DWithin(foo.geom, bar.geom, 50)
      ORDER BY  
      foo.gid, St_Distance(foo.geom, bar.geom)
     )      
To '~path/my_file.txt' WITH DELIMITER ',';

Table 'foo' contains 87 rows and 'bar' contains 16,060 rows. Table 'foo' also contains two columns 'start_time' and 'end_time' depicting an interval of varying years like this:

Start_time              End_time
2003-06-07 00:00:00     2004-09-30 23:59:59
2004-02-03 00:00:00     2005-03-10 23:59:59
2003-07-09 00:00:00     2012-05-06 23:59:59

I would like to export my select query results so that it generates TEXT files for each year like this:

Expected Output:

results_2003.txt
results_2004.txt
results_2005.txt
and so on...

Can someone suggest me how to do that? I am using PostgreSQL version 9.5 (x64) on Windows 7 Enterprise (x64).

EDIT (1):

The text files need to be exported for each year for example: In 'start_time' and 'end_time' columns, if lower bound is 2003 and upper bound is 2012 then file for each year needs to be exported as shown in expected output.

EDIT (2):

So far I have been trying to develop the full working code (many thanks to @Phillip for code and suggested links) but I am running into problems as I am repeatedly getting a syntax error. The error says "syntax error at or near "1" (Where clause). Below I have posted code (again CREDIT goes to Phillip):

-- In the below code: Addr, streets, agfisotime (start time in data type timestamp) and agtisotime (end time in data type timestamp) are original columns

DO $$

DECLARE
start_year date;
end_year date;
current_year date;
file_name text;
 BEGIN
select date_trunc('year', min(agfisotime)), date_trunc('year', max(agtisotime))
  into start_year, end_year
  from Addr;

FOR current_year IN
    SELECT * FROM generate_series(start_year, end_year, interval '1 year')
LOOP

 Execute format('
    Copy(
        Select
        Addr.gid As addr_ID, 
        streets.geom As streets,
        St_LineInterpolatePoint(ST_LineMerge(streets.geom),
            St_LineLocatePoint(St_LineMerge(streets.geom),Addr.geom))
            As interpolated_point
        FROM Addr
        Left JOIN streets ON ST_DWithin(Addr.geom, streets.geom, 50)
        WHERE agfisotime < current_year + interval '1 year'
        AND agtisotime >= current_year
        ORDER BY  
        Addr.gid, St_Distance(Addr.geom, streets.geom)
    ) TO ''C:/temp/out.csv'';
    ',current_year);

END LOOP;
RETURN;
END
$$
khajlk
  • 791
  • 1
  • 12
  • 32
  • 2
    If you have a start in 2003 and and end in 2005 then which file(s) should it appear in? 2003, 2004, 2005 or all of them? – Philip Couling Aug 10 '16 at 15:28
  • @PhilipCouling: The two columns 'start_time' and 'end_time' contains a range of years for example from 2003 to 2007 and text file of each year needs to be exported separately (one file for each year having all records of that year). So, specifically giving you the answer: all of them. – khajlk Aug 11 '16 at 06:40

1 Answers1

2

This is quite complicated but it is possible. You basic problem is that you need to fire the COPY command once for every file you want to write, and you need to do so using dynamic SQL. There's an explanation of that here: Use function variable in dynamic COPY statement

To actually fire the COPY command multiple times you need to use an anonymous code block or a function to retrieve a list of years and then export by firing the COPY command for each year.

I've tested the following and it show's the basic principle although it's not exporting the correct data:

DO $$
DECLARE
    start_year date;
    end_year date;
    current_year date;
    file_name text;
BEGIN
    select date_trunc('year', min(start_time)), date_trunc('year', max(end_time))
      into start_year, end_year
      from foo;

    FOR current_year IN
        SELECT * FROM generate_series(start_year, end_year, interval '1 year')
    LOOP

    file_name := '/tmp/myFile_' || extract(year from current_year) || '.csv';

     --- do your export here using current_year to filter the results
    EXECUTE format('
        COPY (
           select start_time, end_time from foo
         WHERE start_time < %L::date + interval ''1 year''
           AND end_time >= %L::date
          )
        TO %L WITH DELIMITER '','';
        ',
    current_year, current_year, file_name);


    END LOOP;
END
$$;

For each year you will need to filter the results. So your query will look something like this (untested):

EXECUTE format('
    COPY (
Select
   foo.gid As addr_ID, 
   bar.geom As streets,
   St_LineInterpolatePoint
       (ST_LineMerge(bar.geom),
          St_LineLocatePoint
             (St_LineMerge(bar.geom),foo.geom)
   ) As interpolated_point
FROM foo
Left JOIN bar ON ST_DWithin(foo.geom, bar.geom, 50)
WHERE start_time < %L::date + interval ''1 year''
       AND end_time >= %L::date
ORDER BY  
foo.gid, St_Distance(foo.geom, bar.geom)

      )
    TO %L WITH DELIMITER '','';
    ',
current_year, current_year, file_name);
Community
  • 1
  • 1
Philip Couling
  • 13,581
  • 5
  • 53
  • 85
  • Thank you for your help. Although, I have stitched the pieces and tried to make a full working code but I am getting a syntax error. I also followed the dynamic EXECUTE format() construct given at Stack Overflow link but unable to fix the error. – khajlk Aug 12 '16 at 09:45
  • I have edited the question and added details about the error I am getting. – khajlk Aug 12 '16 at 10:18
  • 1
    Looks like one of the problems is that you havn't escaped your quotes for the interval for example `'1 year'` becomes `''1 year''`. I've done some more tests and found there are a few gotchas here so I'll edit the answer. – Philip Couling Aug 12 '16 at 10:31
  • The query works now however, individual year files contains data from other years as well (as you mentioned data is not correctly exported). I tested first part, it worked. Tested second part it worked but when ran the select query separetely it gave a 'syntax error at %L::data at Where Clause' – khajlk Aug 12 '16 at 11:30
  • Yeah so the `%L` is replaced with the year in quotes eg: `'2016-01-01'` or the file name in quotes. So to debug you'd need to change the where clause to something like `WHERE start_time < '2016-01-01'::date + interval '1 year' AND end_time >= '2016-01-01'::date`. The last `%L` is of course replaced by the file name. – Philip Couling Aug 12 '16 at 12:27