On basis of column Articlenbr and amount need to check duplicates and extract those duplicates in another dataframe. Ex in below example i want to extract 1st two rows ,save it in another dataframe and delete from original dataframe. How can be done in pyspark.
Asked
Active
Viewed 34 times
0
-
Does this answer your question? [Remove pandas rows with duplicate indices](https://stackoverflow.com/questions/13035764/remove-pandas-rows-with-duplicate-indices). or https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python – Emma Nov 18 '22 at 16:52
1 Answers
0
Try this:
dups = df.groupby('Articlenbr').count()
dups = dups[dups['amount']>1].index.values
df[df['Articlenbr'].isin(dups)]

gtomer
- 5,643
- 1
- 10
- 21