0

I am trying to find duplicate values of a dataframe column and its count in descending order.

Able to achieve with below code

dfNew = df.pivot_table(index=['Title'],aggfunc='size').sort_values(ascending=False)

However, when I print the dfNew, the results are coming as expected but the title of columns is blank. How to ensure results display under the Title and count columns

<table>

<tbody>
<tr>
<td>Australia Beach</td>
<td>2</td>
</tr>
<tr>
<td>Canada Beach</td>
<td>5</td>
</tr>
</tbody>
</table>
shafee
  • 15,566
  • 3
  • 19
  • 47
TechSavy
  • 11
  • 4
  • Does this helps ? https://stackoverflow.com/questions/42099024/pandas-pivot-table-rename-columns – Surjit Samra Jan 01 '23 at 07:20
  • 1
    Can you provide a clear, minimal reproducible input/output example of what you're trying to achieve? – mozway Jan 01 '23 at 07:27
  • am trying to find duplicate values of a dataframe column and its count by descending order. Able to achieve with below code dfNew = df.pivot_table(index=['Title'], aggfunc='size').sort_values(ascending=False) However when i print the dfNew, the results are coming as expected but the title of columns is blank. How to ensure results display under the Title and count columns – TechSavy Jan 01 '23 at 07:42
  • Please read [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway Jan 01 '23 at 07:49
  • Basically I want to introduce count column as well, since i calculated the count based on the Title column from dataframe. Issue is while displaying datframe, I am getting results (title and respective count) however the respective columns are empty ( i am expecting title and count columns) – TechSavy Jan 01 '23 at 07:53
  • see the table structure above, basically i am missing the column names – TechSavy Jan 01 '23 at 09:27
  • ok finally achieved using below code dfNew = df.groupby(['Title']).size().sort_values(ascending=False).reset_index(name="count") – TechSavy Jan 01 '23 at 09:58

1 Answers1

0

Assuming the dataframe you're starting with looks something like this:

df = pd.DataFrame({
    "Title": [
        'Australia Beach', 
        'Canada Beach', 
        'Australia Beach', 
        'Canada Beach', 
        'Canada Beach', 
        'Canada Beach', 
        'Canada Beach'
    ]
})
print(df)
             Title
0  Australia Beach
1     Canada Beach
2  Australia Beach
3     Canada Beach
4     Canada Beach
5     Canada Beach
6     Canada Beach

A much simpler way to do it is with df.value_counts():

df.value_counts()
Title          
Canada Beach       5
Australia Beach    2
dtype: int64

This function gives you the same output as df.pivot_table(index=['Title'],aggfunc='size').sort_values(ascending=False), but the code is much more concise. It returns a pandas Series object, but it sounds like you want to return a dataframe with Title and Count as the column names. You can do this simply with the following:

pd.DataFrame(df.value_counts(), columns=['count']).reset_index()
             Title  count
0     Canada Beach      5
1  Australia Beach      2
jglad
  • 120
  • 2
  • 2
  • 13