-1

Im using oracle_fdw extension in my postgresql database. I'm trying to copy the data of many tables in the oracle database into my postgresql tables. I'm doing so by running insert into local_postgresql_temp select * from remote_oracle_table. The performance of this operation are very slow and I tried to check the reason for that and mybe choose a different alternative.

1)First method - Insert into local_postgresql_table select * from remote_oracle_table this generated total disk write of 7 M/s and actual disk write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.

2)second method - copy (select * from oracle_remote_table) to /tmp/dump generates total disk write of 4 M/s and actuval disk write of 100 K/s. The copy utility suppose to be very fast but it seems very slow.

-When I run copy from the local dump, the reading is very fast 300 M/s.

-I created a 32G file on the oracle server and used scp to copy it and it took me a few minutes.

-The wals directory is located on a different file system. The parameters I assigned :

min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5 
max_worker_processes = 8 
effective_cache_size = 12GB
work_mem = 128MB
maintenance_work_mem = 4GB
shared_buffers = 2000MB
RAM : 16G
CPU CORES : 8

HOW can I increase the writes ? How can I get the data faster from the oracle database to my postgresql database?

I run perf on this entire process and the results : enter image description here

JeyJ
  • 3,582
  • 4
  • 35
  • 83
  • Is there a LOB column in the Oracle table? – Laurenz Albe Aug 21 '17 at 13:56
  • No, I dont have any lob column. – JeyJ Aug 21 '17 at 14:12
  • If the `SELECT` is slow, try to play with the `prefetch` option of the foreign table. – Laurenz Albe Aug 21 '17 at 14:29
  • The select runs inside a begin end block. How exectly you advice to use prefetch ? Can you please provide an example ? – JeyJ Aug 21 '17 at 14:49
  • For example `ALTER foreign_table OPTIONS (ADD prefetch '1000');` – Laurenz Albe Aug 21 '17 at 18:09
  • I should do it for every table which has alot of rows ? – JeyJ Aug 22 '17 at 05:52
  • Not necessarily. Just try if different values make a difference in `SELECT` speed. This setting has nothing to do with the number of rows in the table. To find out where the time is spent, use profiling (`perf` or `oprofile` on Linux). – Laurenz Albe Aug 23 '17 at 03:45
  • Laurenz, the prefetch command isnt working.. – JeyJ Aug 31 '17 at 06:59
  • Specify what you mean by *isn't working*. – Laurenz Albe Aug 31 '17 at 19:17
  • I tried it again and the command worked. However, I tried it on a foreign table that is 1.5G and it didnt change it dramaticly. With the default settings it took 8 minutes and 35 seconds and after I set the prefetch to 1000,5000,10000 it took 8.59/8.54/8.33 minutes. – JeyJ Sep 03 '17 at 07:55
  • Then prefetching has no impact on your problem. Try profiling PostgreSQL to see where the time is spent. – Laurenz Albe Sep 03 '17 at 10:44
  • i add the perf result to the main comment – JeyJ Sep 04 '17 at 13:59
  • 1
    I cannot be sure, but if what you posted contains everything significant, it seems like the time is spent either on the Oracle server or on the network. Since playing with prefetch did not change anything, I suspect that the problem is outside of oracle_fdw. Tuning Oracle is outside of my competence. – Laurenz Albe Sep 04 '17 at 19:22
  • Is the prefetch option is also available for local postgresql tables ? I mean when I select a local table i can get the data faster when I increase the prefetch ? – JeyJ Sep 05 '17 at 10:55
  • With local tables, data are being processed as they become available, row by row. If you are using the C client API, you can tell the PostgreSQL *client* to either retrieve all rows and buffer them in memory or use single row processing. – Laurenz Albe Sep 06 '17 at 10:37
  • How can I check which api I use ? Which of those options is better And how can I set one of them ? – JeyJ Sep 06 '17 at 11:46
  • If you don't know which API you use, you probably didn't write the program that accesses PostgreSQL, and you couldn't switch to single row more since that requires rewriting the code. The traditional way of fetching all rows at once is better unless there is a vast amount of data or you need to stream for some other reason. – Laurenz Albe Sep 07 '17 at 08:42
  • I didnt write the program. This entire system if about 100 functions in the database. I just run the main function from shell wih psql -d mydb -c "select start_main()". So I dont use any api, just the psql – JeyJ Sep 07 '17 at 10:58
  • 1
    With `psql` you cannot use single row mode. But we have veered far from the question anyway. – Laurenz Albe Sep 11 '17 at 06:38

1 Answers1

0

I'd concentrate on the bits you said were fast.

-I created a 32G file on the oracle server and used scp to copy it and it took me a few minutes.

-When I run copy from the local dump, the reading is very fast 300 M/s.

I'd suggest you combine these two. Use a dump tool (or SQLPLUS) to export the data from Oracle into a file that Postgresql's COPY command can read. You could generate the binary file format directly but its a bit tricky, but generating a CSV separated version etc, shouldn't be too tricky. An example for that is at How do I spool to a CSV formatted file using SQLPLUS?

Community
  • 1
  • 1
  • I cant do that because the import of the data is part of a big postgresql transaction.. – JeyJ Aug 21 '17 at 06:11