I'm using postgres 9.1 with tablefunc:crosstab
I have a table with the following structure:
CREATE TABLE marketdata.instrument_data
(
dt date NOT NULL,
instrument text NOT NULL,
field text NOT NULL,
value numeric,
CONSTRAINT instrument_data_pk PRIMARY KEY (dt , instrument , field )
)
This is populated by a script that fetches data daily. So it might look like so:
| dt | instrument | field | value |
|------------+-------------------+-----------+-------|
| 2014-05-23 | SGX.MiniJGB.2014U | PX_VOLUME | 1 |
| 2014-05-23 | SGX.MiniJGB.2014U | OPEN_INT | 2 |
I then use the following crosstab query to pivot the table:
select dt, instrument, vol, oi
FROM crosstab($$
select dt, instrument, field, value
from marketdata.instrument_data
where field = 'PX_VOLUME' or field = 'OPEN_INT'
$$::text, $$VALUES ('PX_VOLUME'),('OPEN_INT')$$::text
) vol(dt date, instrument text, vol numeric, oi numeric);
Running this I get the result:
| dt | instrument | vol | oi |
|------------+-------------------+-----+----|
| 2014-05-23 | SGX.MiniJGB.2014U | 1 | 2 |
The problem: When running this with lot of real data in the table, I noticed that for some fields the function was splitting the result over two rows:
| dt | instrument | vol | oi |
|------------+-------------------+-----+----|
| 2014-05-23 | SGX.MiniJGB.2014U | 1 | |
| 2014-05-23 | SGX.MiniJGB.2014U | | 2 |
I checked that the dt and instrument fields were identical and produced a work-around by grouping the ouput of the crosstab.
Analysis I've discovered that it's the presence of one other entry in the input table that causes the output to be split over 2 rows. If I have the input as follows:
| dt | instrument | field | value |
|------------+-------------------+-----------+-------|
| 2014-04-23 | EUX.Bund.2014M | PX_VOLUME | 0 |
| 2014-05-23 | SGX.MiniJGB.2014U | PX_VOLUME | 1 |
| 2014-05-23 | SGX.MiniJGB.2014U | OPEN_INT | 2 |
I get:
| dt | instrument | vol | oi |
|------------+-------------------+-----+----|
| 2014-04-23 | EUX.Bund.2014M | 0 | |
| 2014-05-23 | SGX.MiniJGB.2014U | 1 | |
| 2014-05-23 | SGX.MiniJGB.2014U | | 2 |
Where it gets really weird...
If I recreate the above input table manually then the output is as we would expect, combined into a single row.
If I run:
update marketdata.instrument_data
set instrument = instrument
where instrument = 'EUX.Bund.2014M'
Then again, the output is as we would expect, which is surprising as all I've done is set the instrument field to itself.
So I can only conclude that there is some hidden character/encoding issue in that Bund entry that is breaking crosstab.
Are there any suggestions as to how I can determine what it is about that entry that breaks crosstab?
Edit: I ran the following on the raw table to try and see any hidden characters:
select instrument, encode(instrument::bytea, 'escape')
from marketdata.bloomberg_future_data_temp
where instrument = 'EUX.Bund.2014M';
And got:
| instrument | encode |
|----------------+----------------|
| EUX.Bund.2014M | EUX.Bund.2014M |