7

I am new to pandas and I would like to filter a dataframe in pandas that includes the top 5 values in the list. What is the best way to get the 5 values from the list with that code?

My Code:

cheese_top5 = cheese[cheese.year >= 2016]
Julia Meshcheryakova
  • 3,162
  • 3
  • 22
  • 42
Youkesen
  • 137
  • 1
  • 1
  • 4
  • What type of variable are the values? Are they years, integers etc....? – thefragileomen Nov 23 '17 at 20:15
  • is a dataset where I have to select top favorite names of the name I already tried many ways but I didn't find the solution. now Im trying this one: bnames_top5 = bnames.sort_values('year') bnames_top5[bnames_top5 >= 2011] I just want to filter the top5. – Youkesen Nov 23 '17 at 20:25
  • How Can I select just 10 rows and 3 columns. I have in the whole CSV file 1891894 rows × 4 columns. – Youkesen Nov 23 '17 at 20:27
  • 1
    please provide a small (3-7 rows) reproducible sample data set and your desired data set. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. – MaxU - stand with Ukraine Nov 23 '17 at 20:28
  • 2. Exploring Trends in Names One of the first things we want to do is to understand naming trends. Let us start by figuring out the top five most popular male and female names for this decade (born 2011 and later). Do you want to make any guesses? Go on, be a sport!! In [120]: # bnames_top5: A dataframe with top 5 popular male and female names for the decadeb bnames_top5 = bnames.sort_values('year') bnames_top5[bnames_top5 >= 2011] – Youkesen Nov 23 '17 at 20:30
  • 3
    Where is your data? What is your expected output? I want to see 5-10 rows of your data along with what your desired output is. Look at how to give a [mcve] and learn [ask]. Thanks. – cs95 Nov 23 '17 at 22:29

6 Answers6

15

I think what you are looking for is:

cheese.sort_values(by=['Name of column']).head(5)

to say anything more we need to see a sample of your data.

Mark
  • 934
  • 1
  • 10
  • 25
10

You can use the pandas method nlargest:

df['column'].nlargest(n=5)

Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nlargest.html

lux7
  • 1,600
  • 2
  • 18
  • 34
1
dataframe_name['field_name'].value_counts.head(5)
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
  • 1
    Generally, answers are much more helpful if they include an explanation of what the code is intended to do, and why that solves the problem without introducing others. – DCCoder Sep 19 '20 at 03:27
1
import pandas as pd
df = pd.read_csv('911.csv')
df['zip'].value_counts().head(5)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
0

To get the top 5 most occuring values use

df['column'].value_counts().head(n)

The solution provided by @lux7

df['column'].nlargest(n=5)

would result in the top 5 values from a column(their values not how many times they have appeared).

horace_vr
  • 3,026
  • 6
  • 26
  • 48
0

Top 5 values in a column called 'Column_name' in a dataframe called 'df'.

method 1:

df.sort_values('Column_name', ascending=False).head(5)

method 2:

df['Column_name'].nlargest(n=5)
Julia Meshcheryakova
  • 3,162
  • 3
  • 22
  • 42
  • Hi Will There is already an answer in this thread that says this, and duplicating answers is not recommended. When you reach [15 reputation](https://stackoverflow.com/help/privileges) you will be able to "vote up" helpful answers. Take a look at [how to answer](https://stackoverflow.com/help/how-to-answer) and find new, well-asked questions. – Alexander L. Hayes Jan 01 '23 at 01:12