We have a function that allows users to drag and drop a module through the UI interface to form a data processing pipeline, such as reading data, doing preprocessing, doing classification training, etc. After dragging/dropping, these modules will be executed sequentially.
Each module will start a container (via k8s) to run, the results processed by the previous module are saved to cephfs as a file, and the next module reads the file and then performs the operation. This serialization/deserialization process is slow. We plan to use RAPIDS to speed up this pipeline: to improve Inter-module data exchange by putting the data in the GPU MEM. And using cuDF/cuML instead of Pandas/SKLearn to get faster processing speed.
Currently, we have confirmed that these modules can be port from Pandas/SKLearn to cuDF/cuML, but because each module is running in a container, once the module finishes running, the container disappears and the process disappears too, so, the corresponding cuDF data cannot continue to exist in the GPU MEM.
In this case, if you want to use RAPIDS to improve the pipeline, is there any good advice?