I am new to Dask, so kindly forgive me if this question seems silly to you. In Dask, I am working with a Dask dataframe with around 50GB of data. This data is string data that I need to preprocess (fast with the process) before giving it to the machine learning algorithm (fast with threads). Now the problem is that the data frame operations are fast when I design a cluster with respect to processes, but it is slow with respect to threads (but threads are fast with machine learning). Therefore, I am looking for a solution in which we can switch from process to threaded environment.
Currently, I am saving preprocess data with the process cluster and then close it and start a new cluster with a threaded environment to apply the machine learning.
Is there some alternative to solve this problem?
Please help me in this regard.