How to extract all duplicated rows from dataframe and delete them from dataframe in pysapark,pandas

Question

On basis of column Articlenbr and amount need to check duplicates and extract those duplicates in another dataframe. Ex in below example i want to extract 1st two rows ,save it in another dataframe and delete from original dataframe. How can be done in pyspark.

Duplicated rows(save in another dataframe):

original dataframe :

Does this answer your question? [Remove pandas rows with duplicate indices](https://stackoverflow.com/questions/13035764/remove-pandas-rows-with-duplicate-indices). or https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python — Emma, Nov 18 '22 at 16:52

score 0 · Answer 1 · answered Nov 18 '22 at 16:33

0

Try this:

dups = df.groupby('Articlenbr').count()
dups = dups[dups['amount']>1].index.values
df[df['Articlenbr'].isin(dups)]

answered Nov 18 '22 at 16:33

gtomer

5,643
1
10
21

How to extract all duplicated rows from dataframe and delete them from dataframe in pysapark,pandas

1 Answers1