2

I'm trying to join two tables together based on an ID column. The join is not working successfully because I cannot join a varchar column on an integer column, despite using cast().

In the first table, the ID column is character varying, in the format of: XYZA-123456. In the second table, the ID column is simply the number: 123456.

-- TABLE 1
create table fake_receivers(id varchar(11));

insert into fake_receivers(id) values 
('VR2W-110528'),
('VR2W-113640'),
('VR4W-113640'),
('VR4W-110528'),
('VR2W-110154'),
('VMT2-127942'),
('VR2W-113640'),
('V16X-110528'),
('VR2W-110154'),
('VR2W-110528');

-- TABLE 2
create table fake_stations(receiver_sn integer, station varchar);

insert into fake_stations values 
('110528', 'Baff01-01'),
('113640', 'Baff02-02'),
('110154', 'Baff03-01'),
('127942', 'Baff05-01');

My solution is to split the string at the dash, take the number after the dash, and cast it as an integer, so that I may perform the join:

select cast(split_part(id, '-', 2) as integer) from fake_receivers; -- this works fine, seemingly selects integers

However, when I actually attempt to perform the join, I'm getting the following error, despite using an explicit cast:

select cast(split_part(id, '-', 2) as integer), station
        from fake_receivers 
        inner join fake_locations 
        on split_part = fake_locations.receiver_sn -- not recognizing split_part cast as integer! 

>ERROR: operator does not exist: character varying = integer
>Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.

Strangely enough, I can perform this join with my full dataset (a queried result set shows up) but I then can't manipulate it at all (e.g. sorting, filtering it) - I get an error saying ERROR: invalid input syntax for integer: "UWM". The string "UWM" appears nowhere in my dataset or in my code, but I strongly suspect it has to do with the split_part cast from varchar to integer going wrong somewhere.

-- Misc. info
select version();
>PostgreSQL 10.5 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 9.0.0 (clang-900.0.39.2), 64-bit

EDIT: dbfiddle exhibiting behavior

spops
  • 572
  • 1
  • 7
  • 25
  • NOTE to anybody reading this at the moment: minor errors in my code above which I will fix tomorrow, but Tim's answer below uses correct code that correctly reproduces & then subsequently fixes the issue. – spops Nov 20 '18 at 03:08
  • "Smart keys"--your fake_receivers id values--are a design antipattern. – philipxy Nov 20 '18 at 03:50
  • Not sure I understand, can you explain? If you're referring to the fact that there are letters & dash before the number - nothing I can do about that, that's the way the sensor hardware spits out the data; in the second table, we don't always know the letter code associated with a serial number, but always know the serial number and that the serial numbers will always line up. Not ideal but splitting the string in the first table is the only option I have. – spops Nov 20 '18 at 17:15
  • 1
    Whatever raw sensor data we log, we can extract appropriate data into a more suitable schema. [Re "smart keys".](https://stackoverflow.com/a/28454136/3404097) – philipxy Nov 21 '18 at 08:02

1 Answers1

4

You need to include your current logic directly in the join condition:

select *
from fake_receivers r
inner join fake_stations s
    on split_part(r.id, '-', 2)::int = s.receiver_sn;

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • That definitely helps. I'm still getting the mysterious `ERROR: invalid input syntax for integer: "UWM"` in my full db - will update the above fiddle with more complex data to see if I can recreate it. – spops Nov 20 '18 at 03:07
  • 2
    Then most likely your data has some problem. – Tim Biegeleisen Nov 20 '18 at 03:18