3

I'm trying to get an id from a companies table where the id is not yet in the crawlLog table. Then I need to insert that companyId into the crawlLog table.

I need to do this in one call so that parallel crawlers don't pull the same url after some other crawler has selected a url, but hasn't inserted it into the crawl log yet. I don't want to lock tables because of other problems that generates.

I get this error from both queries below:

You can't specify target table 'crawlLog' for update in FROM clause

Here are two queries i've tried to do the same thing.

INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
    (
        SELECT companies.id FROM companies
        LEFT OUTER JOIN crawlLog
        ON companies.id = crawlLog.companyId
        WHERE crawlLog.companyId IS NULL
        LIMIT 1
    ),
    now()
)

I've also tried this, but get the same error:

INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
    (
        SELECT id
        FROM companies
        WHERE id NOT IN
        (
            SELECT companyId
            FROM crawlLog
        )
        LIMIT 1
    ),
    now()
)
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
T. Brian Jones
  • 13,002
  • 25
  • 78
  • 117

4 Answers4

3

Why use a Subselect? INSERT INTO ... SELECT exists:

INSERT INTO crawlLog (companyId, timeStartCrawling)
SELECT companies.id, NOW()
FROM companies
LEFT OUTER JOIN crawlLog
ON companies.id = crawlLog.companyId
WHERE crawlLog.companyId IS NULL
LIMIT 1

And that way it should not complain about using a table both in the INSERT and SELECT part

wonk0
  • 13,402
  • 1
  • 21
  • 15
1

You can't update rows which you are querying. There is a way to force MySQL to use a temporary table implicitly:

INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
    SELECT id, when FROM
    (
    SELECT companies.id AS id, now() AS when FROM companies
    LEFT OUTER JOIN crawlLog
    ON companies.id = crawlLog.companyId
    WHERE crawlLog.companyId IS NULL
    LIMIT 1
    )
)
Community
  • 1
  • 1
André Laszlo
  • 15,169
  • 3
  • 63
  • 81
1

This works and seems like the simplest solution:

Using the simpler of the two statements in my question, I created an alias for the inner crawlLog table as suggested by @Tocco in the comments, and then removed the necessary encapsulation in VALUES().

INSERT INTO crawlLog (companyId, timeStartCrawling)
SELECT id, now()
FROM companies
WHERE id NOT IN
(
    SELECT companyId
    FROM crawlLog AS crawlLogAlias
)
LIMIT 1
T. Brian Jones
  • 13,002
  • 25
  • 78
  • 117
  • 1
    Not sure if it's a problem or not, but putting a SELECT inside of VALUES seems odd (unnecessary). – Chains Jul 08 '11 at 21:06
0

Do the select into a temp table, then insert selecting from the temp table. You can't insert into a table and select from it in the same statement, so use a temp table and two statements.

antlersoft
  • 14,636
  • 4
  • 35
  • 55