I am new to pyspark, my task is to copy the source folder data to destination folder using pyspark where parallelization also happen.
in python i can copy the data using
from shutil import copytree
copytree(source, destination)
with this i am able to recusively copy whole data with folder structure using standard python. i want to do the same. task using pyspark on a cluster. how should i proceed , i am using YARN as resource manager.