I have long format data on businesses, with a row for each occurrence of a move to a different location, keyed on business id -- there can be several move events for any one business establishment.
I wish to reshape to a wide format, which is typically cross-tab territory per the tablefunc
module.
+-------------+-----------+---------+---------+
| business_id | year_move | long | lat |
+-------------+-----------+---------+---------+
| 001013580 | 1991 | 71.0557 | 42.3588 |
| 001015924 | 1993 | 71.0728 | 42.3504 |
| 001015924 | 1996 | -122.28 | 37.654 |
| 001020684 | 1992 | 84.3381 | 33.5775 |
+-------------+-----------+---------+---------+
Then I transform like so:
SELECT longbyyear.*
FROM crosstab($$
SELECT
business_id,
year_move,
max(longitude::float)
from business_moves
where year_move::int between 1991 and 2010
group by business_id, year_move
order by business_id, year_move;
$$
)
AS longbyyear(biz_id character varying, "long91" float,"long92" float,"long93" float,"long94" float,"long95" float,"long96" float,"long97" float, "long98" float, "long99" float,"long00" float,"long01" float,
"long02" float,"long03" float,"long04" float,"long05" float,
"long06" float, "long07" float, "long08" float, "long09" float, "long10" float);
And it --mostly-- gets me to the desired output.
+---------+----------+----------+----------+--------+---+--------+--------+--------+
| biz_id | long91 | long92 | long93 | long94 | … | long08 | long09 | long10 |
+---------+----------+----------+----------+--------+---+--------+--------+--------+
| 1000223 | 121.3784 | 121.3063 | 121.3549 | 82.821 | … | | | |
| 1000678 | 118.224 | | | | … | | | |
| 1002158 | 121.98 | | | | … | | | |
| 1004092 | 71.2384 | | | | … | | | |
| 1007801 | 118.0312 | | | | … | | | |
| 1007855 | 71.1769 | | | | … | | | |
| 1008697 | 71.0394 | 71.0358 | | | … | | | |
| 1008986 | 71.1013 | | | | … | | | |
| 1009617 | 119.9965 | | | | … | | | |
+---------+----------+----------+----------+--------+---+--------+--------+--------+
The only snag is that I would ideally have populated values for each year and not just have values in move years. Thus all fields would be populated, with a value for each year, with the most recent address carrying over to the next year. I could hack this with manual updates if each is blank, use the previous column, I just wondered if there was a clever way to do it either with the crosstab()
function, or some other way, possibly coupled with a custom function.