Basically the number of vertices used by EXTRACT are being determined by the following:
- Number of files (currently at most one file per vertex) if you use file sets or request
AtomicFileProcessing=true
(e.g., JSON, current Avro Extractor).
- Size of a file (currently 1GB per vertex) if the file is considered splittable (
AtomicFileProcessing=false
, e.g., Csv/Tsv extractors).
The ROWCOUNT hint will only hint the resulting row count that will impact the subsequent partitioning.
Then the Analytics Units allocation mentioned by Omid will give you the actual degree of parallelism that is used to parallelize within the determined number of vertices (so overspecifying the Analytics Units will NOT make your code parallelize more).
Why do you want to increase the scale-out on the extraction?