I have a dataframe with 1% positive classes (1's) and 99% negatives (0's) and I am working with a Logistic Regression in Pyspark. I rode here about dealing with unbalanced datasets, and the solution is to add a weightCol, as it says in the answer provided in the link, in order to tell the model to focus more on the 1's, as there are less.
I've tried it and it works well, but I don't know how mllib balances the data internally. Someone has a clue ? I don't like working with "black boxes" I can't comprehend.