I would like to partitioning a dataframe in a stratified way. That is, the dataframe has a column with a lots of zeros and just a few ones values. And I would like to partition it keeping the ratio between zeros and ones using a Custom Partitioner, but I don't know how can I do it.
Here Stratified sampling with pyspark and here Stratified sampling in Spark I have found similar situations but using sampling instead partitioning. Any idea? This is the first time I'm trying to partitionate the data in a custom way. I'm using Spark + Scala + Dataframes