-2

I am exporting client data from a database into Salesforce. I have two tables (Fiddle shown below)

Table: SALESFORCE_CONTACTS

user_id start_at end_at company_id
37483 2021-12-12 NULL 88
23982 2020-03-02 2021-04-02 169

Table: DB_CONCATS

user_id start_at end_at company_id
37483 2021-12-12 NULL 88
23982 2020-03-02 2021-04-02 169

Every so often a column in DB_CONCATS will change and on the export (every 2 hours) the Salesforce data will change. Right now I am exporting every single contact every 2 hours.

But I only want to export the records that do not match the Salesforce table. The only way I can think of is to write something like this:

SELECT
    DB_CONTACTS.*
FROM
    DB_CONTACTS
LEFT JOIN 
    SALESFORCE_CONTACTS ON DB_CONTACTS.user_id = SALESFORCE_CONTACTS.DB_USER_ID
WHERE 
    DB_CONTACTS.user_id <> SALESFORCE_CONTACTS.db_user_id
    OR DB_CONTACTS.created_at <> SALESFORCE_CONTACTS.db_created_at
    OR DB_CONTACTS.disabled_at <> SALESFORCE_CONTACTS.db_disabled_at
    OR DB_CONTACTS.company_id <> SALESFORCE_CONTACTS.db_company_id

Great! That produces no records because the two tables are exactly alike at the moment. Once the end_at in DB_CONCATS is filled out a record will appear in the query and then, on the next extract, it will transfer it over to SALESFORCE_CONTACTS

The trouble is that the sales team is constantly adding new fields to transfer data to. So every time that is done I would have to add a line to the query above and it would be trouble to maintain. Is there an easier way to compare the two tables without so many lines of code?

http://sqlfiddle.com/#!9/3ca536/9

Thank you

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Tony
  • 29
  • 3
  • Move all the JOIN conditions to the ON clause to get true LEFT JOIN result. – jarlh Feb 21 '22 at 19:00
  • you could use dynamic sql atfer getting all columns in https://dev.mysql.com/doc/refman/8.0/en/information-schema-columns-table.html – nbk Feb 21 '22 at 19:00
  • 2
    Which dbms are you using? – jarlh Feb 21 '22 at 19:01
  • If your query joins on user_id, then `DB_CONTACTS.user_id <> SALESFORCE_CONTACTS.db_user_id` is logical nonsense. And that outer join is logically converted into an inner join due to the WHERE clause. You need better test data to see these flaws. – SMor Feb 21 '22 at 19:04
  • Using `SELECT *` is antipattern, but if you are sure that both tables are always in sync with column order then using `EXISTS (SELCECT t.1* EXCEPT SELECT t2.*)` https://dbfiddle.uk/?rdbms=postgres_14&fiddle=20b1fe6683e475c20a507aba52b08494, which is implenentation of **[IS DISTINCT FROM](https://stackoverflow.com/a/50280408/5070879)** – Lukasz Szozda Feb 21 '22 at 19:04

1 Answers1

1

compare the schema and build a string to build the sproc / alter table programatically if they change it constantly. If the schema has changed you might throw all away and re-build the table and re-populate it from scratch, and if the schema has NOT changed:

SQL discards rows if you have any comparison against a NULL value. so <> won't cut it for such scenarios, I suggest the following as long as USER_ID is never null and is UNIQUE in both tables:

declare @a table (user_id int,  start_at datetime,  end_at datetime,    company_id int);
declare @b table (user_id int,  start_at datetime,  end_at datetime,    company_id int)

insert into @a values(37483,'2021-12-12',   NULL,   88)
insert into @a values(23982,'2020-03-02',   '2021-04-02',   169)

insert into @b values(37483,'2021-12-12',   NULL,   88)
insert into @b values(23982,'2020-03-02',   '2021-04-02',   169)

update @b set start_at = null where user_id = 37483

SELECT
  *
FROM
  @a a
  full join @b b on a.user_id = b.user_id
WHERE NOT EXISTS(
    select a.start_at, a.end_at, a.company_id
    intersect
    select b.start_at, b.end_at, b.company_id
)
Alex
  • 2,247
  • 1
  • 27
  • 37