I'm trying to downsample dataframe rows in order to create a smaller dataframe. Let's assume our dataframe has several columns and each column has predefined categorical values. How to make sure every distinct categorical value has a chance of presence in new resampled dataframe?
For example:
rows = [{'A':'a', 'B':'d', 'C':'g'},{'A':'a', 'B':'e', 'C':'h'},{'A':'a', 'B':'d', 'C':'g'},{'A':'c', 'B':'f', 'C':'i'},{'A':'c', 'B':'d', 'C':'g'},{'A':'b', 'B':'e', 'C':'h'}]
pd.DataFrame(rows)
out put of the code
In column 'A' we have 'a', 'b' and 'c' values. How to make sure after resampling non of these values are lost?