0

I have oracle table contain 900 million records , this table partioned to 24 partion , and have indexes :

i try to using hint and i put fetch_buffer to 100000:

select /+ 8 parallel +/
* from table

it take 30 minutes to get 100 million records

my question is : is there are any way more faster to get the 900 million (all data in the table ) ? should i use partions and did 24 sequential queries ? or should i use indexes and split my query to 10 queries for example

APC
  • 144,005
  • 19
  • 170
  • 281
code
  • 99
  • 1
  • 7
  • 3
    *"take 30 minutes to get 100 million records"* Get it to where How much of that time is query time and how much transport across a network and rendering in a client? Database query optimisation is all about the details. There are no simple heuristics which can solve all problems. Please read [this post on asking Oracle tuning questions](https://stackoverflow.com/a/34975420/146325). I explains the information you need to provide before we can attempt an answer, Although we can say that indexes won't help you speed up a full table read. – APC Nov 09 '19 at 16:05
  • i extract them to files , how i can calculate the time of transport across a network and rendering in a client ? i using jdbc Scala library to extract the data – code Nov 09 '19 at 16:18

2 Answers2

0

The network is almost certainly the bottleneck here. Oracle parallelism only impacts the way the database retrieves the data, but data is still sent to the client with a single thread.

Assuming a single thread doesn't already saturate your network, you'll probably want to build a concurrent retrieval solution. It helps that the table is already partitioned, then you can read large chunks of data without re-reading anything.

I'm not sure how to do this in Scala, but you want to run multiple queries like this at the same time, to use all the client and network resources possible:

select * from table partition (p1);
select * from table partition (p2);
...
Jon Heller
  • 34,999
  • 6
  • 74
  • 132
  • using partions as you mention , and i can get 455 million just in 20 minutes :) thank you – code Nov 10 '19 at 15:52
0

Not really an answer but too long for a comment.

A few too many variables can impact this to give informed advice, so the following are just some general hints.

Is this over a network or local on the server? If the database is remote server then you are paying a heavy network price. I would suggest (if possible) running the extract on the server using the BEQUEATH protocol to avoid using the network. Once the file(s) complete, is will be quicker to compress and transfer to destination than transferring the data direct from database to local file via JDBC row processing.

With JDBC remember to set the cursor fetch size to reduce round tripping - setFetchSize. The default value is tiny (10 I think), try something like 1000 to see how that helps.

As for the query, you are writing to a file so even though Oracle might process the query in parallel, your write to file process probably doesn't so it's a bottleneck.

My approach would be to write the Java program to operate off a range of values as command line parameters, and experiment to find which range size and concurrent instances of the Java give optimal performance. The range will likely fall within discrete partitions so you will benefit from partition pruning (assuming the range value is an a indexed column ideally the partition key).

Roughly speaking I would start with range of 5m, and run concurrent instances that match the number of CPU cores - 2; this is not a scientifically derive number just one that I tend to use as my first stab and see what happens.

TenG
  • 3,843
  • 2
  • 25
  • 42