0

enter image description here

On basis of column Articlenbr and amount need to check duplicates and extract those duplicates in another dataframe. Ex in below example i want to extract 1st two rows ,save it in another dataframe and delete from original dataframe. How can be done in pyspark.

Duplicated rows(save in another dataframe): enter image description here

original dataframe : enter image description here

  • Does this answer your question? [Remove pandas rows with duplicate indices](https://stackoverflow.com/questions/13035764/remove-pandas-rows-with-duplicate-indices). or https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python – Emma Nov 18 '22 at 16:52

1 Answers1

0

Try this:

dups = df.groupby('Articlenbr').count()
dups = dups[dups['amount']>1].index.values
df[df['Articlenbr'].isin(dups)]
gtomer
  • 5,643
  • 1
  • 10
  • 21