0

I am trying to do what should be a pretty straightforward insert statement in a postgres database. It is not working, but it's also not erroring out, so I don't know how to troubleshoot. This is the statement:

INSERT INTO my_table (col1, col2) select col1,col2 FROM my_table_temp;

There are around 200m entries in the temp table, and 50m entries in my_table. The temp table has no index or constraints, but both columns in my_table have btree indexes, and col1 has a foreign key constraint.

I ran the first query for about 20 days. Last time I tried a similar insert of around 50m, it took 3 days, so I expected it to take a while, but not a month. Moreover, my_table isn't getting longer. Queried 1 day apart, the following produces the same exact number.

select count(*) from my_table;

So it isn't inserting at all. But it also didn't error out. And looking at system resource usage, it doesn't seem to be doing much of anything at all, the process isn't drawing resources.

Looking at other running queries, nothing else that I have permissions to view is touching either table, and I'm the only one who uses them.

I'm not sure how to troubleshoot since there's no error. It's just not doing anything. Any thoughts about things that might be going wrong, or things to check, would be very helpful.

Reen
  • 399
  • 1
  • 5
  • 12
  • 1
    You may be hitting issues with [locking due to your btree index](https://stackoverflow.com/questions/41852423/insert-performance-bitmap-vs-b-tree) – JNevill Jan 10 '18 at 16:51
  • Thanks, I wasn't aware that was an issue. Each key in the btree references about 5k rows, so that may be a problem. Do you know of a way to check whether that locking is occurring? I can also try reindexing, but that'll take a while to check. Though probably not a whole month! – Reen Jan 10 '18 at 16:56
  • 1
    I'm not 100% sure that it's your issue, but it seems like a good place to start. [Check out this link](https://community.oracle.com/thread/1034701?tstart=0) for some ideas on how to see locks on tables. – JNevill Jan 10 '18 at 16:58
  • 1
    Your tag says "postgresql" but the comments above are geared towards Oracle, so I querying the `v$session` views aren't going to be of much use to you. As far as your INSERT goes, are you sure it is being committed? If the statement is rolled back, you'll not get an error, nor will you get any data inserted. Are you using the `psql` client, or a GUI? – bma Jan 10 '18 at 17:08
  • Thanks. I noticed one of the links was for oracle so looked at postgres documentation instead. psql client itself. The process is active when I view pg_stat_activity, and was active for 20 days. So I don't think the issue is that it's doing the insert and then rolling back, it doesn't get that far. – Reen Jan 10 '18 at 18:16
  • After looking at the postgres documentation on index locking, (and the stackoverflow question) it seems that having an index refer to many rows many slow the process down, but not stop it all together. But so far 0 rows have been inserted. https://www.postgresql.org/docs/9.2/static/locking-indexes.html – Reen Jan 10 '18 at 18:20
  • There's also only one process, so it shouldn't be an issue of concurrent inserts, right? – Reen Jan 10 '18 at 18:22
  • 1
    When you query `pg_stat_activity`, are any of the active queries in a blocked state? It is remarkable to me that anything could take that long, given the number of rows you're dealing with. If you run `EXPLAIN INSERT ...` Is the SELECT from the temp table doing a full table scan? Not that that would account for *20 days* of execution. – bma Jan 10 '18 at 18:59
  • Nothing is in a blocked state I don't think. (waiting is f) Would you expect a table scan to take longer than a count(*)? Because that finishes in a couple minutes. Looking at pg_locks, the only other thing using the table is a vacuum function, but that only started 3 hours ago. – Reen Jan 10 '18 at 19:07
  • Does anyone know if I should be able to observe the insert taking place? Hard to test this with a test table, but if I query select count(*) from my_table, will the number increase as the insert progresses? (it's something that happens too fast in a test table, so I'm not sure how to check) – Reen Jan 10 '18 at 19:09
  • The row count won't increase until the INSERT commits. Is col1 indexed on the child table (the one that is referencing the parent table's `col1` column). If no, the FK check will be doing a sequential scan for every row you are adding. Also, have you defined your FK as `DEFERRABLE INITIALLY DEFERRED`? That will defer validation and is generally desirable. – bma Jan 10 '18 at 19:20
  • Just remembered, (it's been 3 weeks...), the query was made through psycopg2 not in postgres directly. That means that it would have been in a transaction, so you're right, it won't have committed. col1 in my_table has a foreign key constraint referencing a column in other_table that is unique and has a btree index. Last thing that might be relevant, both columns are bigints. – Reen Jan 10 '18 at 19:25
  • I have not set my foreign key to deferrable initially deferred. I didn't know that's was a thing. Reading about it now. – Reen Jan 10 '18 at 19:27
  • My suggestion is to test the same process in `psql` directly, or at the very least, confirm that a commit is being issued after the insert. – bma Jan 10 '18 at 19:30
  • The commit is automatic in the object I use to execute queries, so I don't think it was a commit issue. I've used that insert script before without any trouble in the past. I started another process about five hours ago directly in psql, though. select count(*) has not moved, (though I'm still not positive whether it should) – Reen Jan 10 '18 at 19:34
  • Short of seeing the EXPLAIN and EXPLAIN ANALYZE output (and DDL from the tables, I don't know what else to suggest. – bma Jan 10 '18 at 19:36
  • 10 minutes into explain analyze insert..., and nothing's happened. No steps have been returned yet. – Reen Jan 10 '18 at 19:49
  • Thanks for helping me troubleshoot. I was at a loss for what to even look at next, (still am, but at least I've tried and learned a few more things.) – Reen Jan 10 '18 at 19:52
  • EXPLAIN output only shows details after the query finishes. EXPLAIN won't run the query, it just shows the query plan that would be generated by your SQL. Some other general performance tips for this operation, if you can afford them: set `synchronous_commit=off`, and drop any indexes on the destination table that are not being used by the constraints (FK, PK, etc) (and recreate them after the load). – bma Jan 10 '18 at 19:59
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/162930/discussion-between-bma-and-reen). – bma Jan 10 '18 at 20:04

4 Answers4

2

For the sake of anyone stumbling onto this question in the future:

After a lengthy discussion (see linked discussion from the comments above), the issue turned out to be related to psycopg2 buffering the query in memory.

Another useful note: inserting into a table with indices is slow, so it can help to remove them before bulk loads, and then add them again after.

bma
  • 9,424
  • 2
  • 33
  • 22
1

In my case it was a TRIGGER on the same table I was updating and it failed without errors.

Deactivated the trigger and the update worked flawlessly.

mrbarletta
  • 902
  • 11
  • 17
  • this is what happened to me as well, anyone know where to go to see any logs associated with triggers that failed? – jcollum May 02 '23 at 15:29
0

in my case it was date format issue. i commented date attribute before interting to DB and it worked.

farruq
  • 29
  • 3
0

Try to use the semicolon at the end of your query to get the debug information. In my case case it was failing because it was not able to find the end of insert statement.

Parveen Verma
  • 16,690
  • 1
  • 14
  • 19