3

I’d like to have a function df_out(df_in,val_min,val_max) that makes a sorted series/dataframe from another series/dataframe by picking rows where values in one column are within a defined range. E.g., if df_in looks like this:

Name   Age

John  13

Jack  19

Sylvia 21

Anna 14

Carlos 15

Vladimir 30

Gustav 28

Amie 24

I’d like df_out(18,25) to look like this:

Name Age

Jack 19

Sylvia 21

Amie 24

What's the most "pythonic" way to do this? Thanks!

thor
  • 21,418
  • 31
  • 87
  • 173
Alpha
  • 45
  • 3
  • 9

2 Answers2

4

Why use a function when it is so easily done natively?

>>> df[df.Age.between(18, 25)]
     Name  Age
1    Jack   19
2  Sylvia   21
7    Amie   24

>>> df[df.Age.between(19, 24, inclusive=False)]
     Name  Age
2  Sylvia   21
Alexander
  • 105,104
  • 32
  • 201
  • 196
2

Once you have it in a DataFrame df, with columns Name, and Age, you can simply use

df[(min_val <= df.Age) & (df.Age <= max_val)]

Note that you need to use the seemingly-redundant parentheses in the above expression, due to operator precedence.


You can create this into a function like so:

def df_limited(df, min_val, max_val):
    return df[(min_val <= df.Age) & (df.Age <= max_val)]
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Thanks. And is there a way to create a function for this, i.e. like: df_limited(df, min_val, max_val) = df[(min_val <= df.Age) & (df.Age <= max_val)] ? – Alpha Feb 19 '16 at 20:38
  • Great, thanks. I have a follow-up question. How can I interpolate over the resulting histogram and plot the result? Do I need to create a function from the data frame first? – Alpha Feb 23 '16 at 00:20
  • I suggest you ask that as a new question. StackOverflow isn't really built for a discussion on followup questions in the comments section. I'm sure people will answer it. – Ami Tavory Feb 23 '16 at 05:36