How to speed up head function execution time in Koalas?

Question

For large datasets, koalas.head(n) function takes a really long time. I understand that it tries to bring back all the data in driver node and then present the absolutely top n rows.

Is there any quick way to analyse top n rows in koalas such that only single or few partitions are involved to get the intended result? I do not want to necessarily see the absolute first n rows, they can be randomly distributed across different executor nodes or even reside within the same partition.

score 2 · Accepted Answer · answered Aug 31 '22 at 14:19

2

Adding this statement after importing Koalas seemed to help for me:

koalas.set_option('compute.default_index_type', 'distributed-sequence')

answered Aug 31 '22 at 14:19

Dean

168
8

How to speed up head function execution time in Koalas?

1 Answers1