I want to drop 20% of rows that do not contain 'p' or 'u' in label column. I know how to drop all of them, but I do not know how to drop certain percent of rows. This is my code:
import pandas as pd
df = pd.DataFrame({"text": ["a", "b", "c", "d", "e", "f", "g", "h"],
"label": ["o-o-o", "o-o", "o-u", "o", "o-o-p-o", "o-o-o-o-o-o", "p-o-o", "o-o"]
})
print(df)
df = df[(df["label"].str.contains('p')) | (df["label"].str.contains('u'))]
print(df)