2

I am trying to concatenate 2 columns (one is a CHAR(7) and the other is CHAR(6)), add an _ separator between the 2 and limit the result to 10,000 records. I also want to take a random sample.

I have read a number of excellent posts on here including:

Combine two columns and add into one new column

The above post as well as others I have looked at had great starting points for me but I have not yet managed to deliver the expected result.

First I tried:

SELECT osnrth1m||'_'||oseast1m FROM postzon
ORDER BY RANDOM()
LIMIT 10000

While it delivered the correct result of the form '1234567_123456' (ignore the quotes) there were a number of results that were '_'.

The linked post above said that this was because using the concatenation operator (i.e attribute1||attribute2) does not account for NULL values.

I then tried to use the COALESCE function:

SELECT COALESCE(osnrth1m,'')||'_'||COALESCE(oseast1m,'') FROM postzon
ORDER BY RANDOM()
LIMIT 10000;

But this also produced some results that were just underscores as per previous.

I then tried the CONCAT_WS function:

SELECT CONCAT_WS('_',osnrth1m,oseast1m) FROM postzon
ORDER BY RANDOM()
LIMIT 10000;

But unfortunately again some rows returned with just an underscore.

Finally I tried messing about with CASE - I couldn't work out a way to write the query so that NULL values were ignored. In fact the last one I tried specifically says if NULL then show result as NULL (that's what I think it's saying anyway!):

SELECT CASE WHEN (osnrth1m,oseast1m) IS NULL THEN NULL
        ELSE CONCAT_WS('_',osnrth1m,oseast1m) 
        END
        FROM postzon
ORDER BY RANDOM()
LIMIT 10000;

I'm sorry if I'm missing something glaringly obvious but could anyone suggest a way achieve this that will just skip NULL values completely?

Something along the lines of:

SELECT CASE WHEN (osnrth1m,oseast1m) IS NULL THEN IGNORE/DROP/EXCLUDE
            ELSE CONCAT_WS('_',osnrth1m,oseast1m) 
            END
            FROM postzon
    ORDER BY RANDOM()
    LIMIT 10000;

Is there a PostgreSQL operator like IGNORE or DROP or EXCLUDE?

For each of the queries above I tried including the following line after the SELECT statement:

WHERE (osnrth1m IS NOT NULL OR oseast1m IS NOT NULL) 

but this didn't seem to change the results.

Any help would be gratefully received!

Thanks

Community
  • 1
  • 1
mmTmmR
  • 573
  • 2
  • 8
  • 20
  • this should run fine, weird SELECT COALESCE(osnrth1m,'')||'_'||COALESCE(oseast1m,'') FROM postzon WHERE (osnrth1m IS NOT NULL OR oseast1m IS NOT NULL) ORDER BY RANDOM() LIMIT 10000; – LongBeard_Boldy Nov 16 '15 at 15:32
  • @LongBeard_Boldy - thanks for the quick reply. I tried this again but this time copying it from your answer incase I had made a silly typo - still gives me some results with underscores. Have you got any other suggestions? Many thanks – mmTmmR Nov 16 '15 at 15:46
  • maybe ty this one ? SELECT COALESCE(osnrth1m,'')||'_'||COALESCE(oseast1m,'') FROM postzon WHERE not (COALESCE(osnrth1m,'')||'_'||COALESCE(oseast1m,'') = '_') ORDER BY RANDOM() LIMIT 10000; – LongBeard_Boldy Nov 16 '15 at 16:01
  • @LongBeard_Boldy - thanks so much mate, managed to get it working using stas.yaranov's suggestion. Really appreciate your help - if I could select 2 answers than I would! – mmTmmR Nov 16 '15 at 16:05

1 Answers1

1

You need both osnrth1m and oseast1m to be not null. This is why you should use and but not or:

WHERE (osnrth1m IS NOT NULL AND oseast1m IS NOT NULL)

And the query is:

SELECT
        CONCAT_WS('_',osnrth1m,oseast1m)
    FROM postzon
    WHERE (osnrth1m IS NOT NULL AND oseast1m IS NOT NULL)
    ORDER BY RANDOM()
    LIMIT 10000;

If there a empty osnrth1m and oseast1m values which also need to be excluded then:

SELECT
            CONCAT_WS('_',osnrth1m,oseast1m)
        FROM postzon
        WHERE
            osnrth1m IS NOT NULL AND
            oseast1m IS NOT NULL AND
            osnrth1m <> '' AND
            oseast1m <> ''
        ORDER BY RANDOM()
        LIMIT 10000;
stas.yaranov
  • 1,797
  • 10
  • 17
  • Hi @stas.yaranov, thanks for the quick reply. I had actually tried the AND instead of OR but was worried that missed the case where one of them was NULL but the other wasn't. I tried your query anyway (the one I tried before using AND just used ||) unfortunately it also produced some results with just underscores. Have you got any other ideas? Many thanks – mmTmmR Nov 16 '15 at 15:43
  • stas.yaranov is right, your main problem is not with null values, those are empty strings. thats why you get those underscores – LongBeard_Boldy Nov 16 '15 at 15:50
  • @stas.yaranov Thanks so much that was exactly what that was! I was under the impression that all blank attributes in this database were NULL, not empty strings so a massive massive thanks for pointing that out to me asI would never have thought about that - life saver! – mmTmmR Nov 16 '15 at 16:02