to sort, group and display duplicated values of a column

Asked Sep 12 '19 at 14:38

Active Sep 12 '19 at 14:38

Viewed 202 times

I would like to sort, group and display duplicated values of a column in table form. I found some code snippets from this thread. However, they produced different output. Which is a better way and what are the difference between them?

pd.concat(g for _, g in df.groupby("column_name") if len(g) > 1)

The above show values with special characters but doesn't show NaN

>>> ids = df["column_name"]
>>> df[ids.isin(ids[ids.duplicated()])].sort_values("column_name")

The above shows NaN but not special characters.

df[df['column_name'].duplicated() == True]

Completely different results from the above two.

asked Sep 12 '19 at 14:38

Organic Heart

You should probably add the output as well – Revolucion for Monica Sep 15 '19 at 22:57
Can you give some example input data? Which kind of special characters? Are you working in utf-8 and using accents etc? – Robert Feb 02 '21 at 06:10
I went for both ways and got the same result with my dataset: indices = ints; testcontent = "abc_123456" with random letters and numbers – Robert Feb 02 '21 at 07:13

to sort, group and display duplicated values of a column

0 Answers0

Linked