1

I have two tables: IsCompanyValid and CompanyData.

IsCompanyValid:

Company   IsValid
A         TRUE
B         TRUE
C         TRUE
D         FALSE

CompanyData:

Company   Data
A         Data1
A         Data1
A         Data1
B         Data1
B         Data1
C         Data1
D         Data1

I want to insert Data called 'testData' into CompanyData, for each unique company that is valid. So I want the CompanyData to look like this:

CompanyData:

Company   Data
A         Data1
A         Data1
A         Data1
B         Data1
B         Data1
C         Data1
D         Data1
A         testData
B         testData
C         testData

Here is the query I wrote:

INSERT into CompanyData (Company, Data)
SELECT
    distinct Company,
    'testData'
FROM CompanyData

-- Make sure the data we are inserting is for valid companies only
WHERE Company in
(
    select Company from IsCompanyValid
    where IsValid = true
)
AND 'testData' not in
(
    select Data from CompanyData
);

Now this works fine. But if the CompanyData table was slightly modified, so that 'testData' already exists, this query will no longer work. For example:

CompanyData:

Company   Data
A         Data1
A         Data1
A         Data1
B         Data1
B         Data1
C         Data1
D         Data1
A         testData

This query will no longer insert for B and C. I'm not sure how I can modify my query to insert for B and C.
The real data is much larger, so I need to make my solution generalized.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Tim
  • 478
  • 4
  • 15

1 Answers1

1

This inserts row like requested:

INSERT INTO CompanyData(Company, Data)
SELECT i.Company, 'testData'::text
FROM   IsCompanyValid i
WHERE  i.IsValid 
AND    NOT EXISTS (
   SELECT FROM CompanyData d
   WHERE  d.Company = i.Company
   AND    d.Data = 'testData'
   );

Companies are skipped where a row with Data = 'testData' already exists.

Should also be fast and NULL-safe (as opposed to NOT IN) with an index on CompanyData(Company, Data).
Or even faster with a more specialized partial index:

CREATE INDEX foo ON CompanyData(Company)
WHERE  Data = 'testData';

But the special index only pays if you keep needing it.

See:

Aside: consider legal, lower-case, unquoted identifiers in Postgres. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for pointing me in the right direction Erwin! And thanks for your other advice as well! – Tim Jul 20 '20 at 18:39