I have a dataframe, which I need to sample into two, where one sample should not contain entries from the other. I can run two sample operations, but this does not guarantee the same
df.sample(frac=0.8)
df.sample(frac=0.2)
I have tried the follwoing as well. But this throws the error ValueError: cannot compute isin with a duplicate axis.
df1 = df.sample(frac=0.8)
df[~df.isin(df1).all(1)]
What can be done to achieve thsi split
piRSquared's edit
df = pd.DataFrame(np.arange(200).reshape(100, 2), columns=list('AB'))
n_80pct = df.shape[0] // 5 * 4
df_sampled = df.sample(frac=1)
df_80 = df_sampled.iloc[:n_80pct]
df_20 = df_sampled.iloc[n_80pct:]