Pandas Equivalent of R's which()

Question

Variations of this question have been asked before, I'm still having trouble understanding how to actually slice a python series/pandas dataframe based on conditions that I'd like to set.

In R, what I'm trying to do is:

df[which(df[,colnumber] > somenumberIchoose),]

The which() function finds indices of row entries in a column in the dataframe which are greater than somenumberIchoose, and returns this as a vector. Then, I slice the dataframe by using these row indices to indicate which rows of the dataframe I would like to look at in the new form.

Is there an equivalent way to do this in python? I've seen references to enumerate, which I don't fully understand after reading the documentation. My sample in order to get the row indices right now looks like this:

indexfuture = [ x.index(), x in enumerate(df['colname']) if x > yesterday]

However, I keep on getting an invalid syntax error. I can hack a workaround by for looping through the values, and manually doing the search myself, but that seems extremely non-pythonic and inefficient.

What exactly does enumerate() do? What is the pythonic way of finding indices of values in a vector that fulfill desired parameters?

Note: I'm using Pandas for the dataframes

can you try: `[a.index() for (a, b) in enumerate(df['colname']) if b > yesterday]` — , Aug 01 '14 at 18:04
Just to be clear, pandas DataFrames can have all sorts of indices, not just integers. Do you only want integer indices, or the actual original row-indices? — smci, Nov 27 '16 at 17:31
Related question [Python equivalent of which() in R](http://stackoverflow.com/questions/12207014/python-equivalent-of-which-in-r) — smci, Nov 27 '16 at 17:35
The question asks about `which()` which returns a vector of indices in which some condition was met. The top answer is about boolean subsetting. [This post](https://stackoverflow.com/questions/21800169/python-pandas-get-index-of-rows-which-column-matches-certain-value) contains what I see as an actual equivalent to `which()`. — Hendy, Aug 27 '17 at 17:12

score 14 · Accepted Answer · answered Aug 01 '14 at 20:53

I may not understand clearly the question, but it looks like the response is easier than what you think:

using pandas DataFrame:

df['colname'] > somenumberIchoose

returns a pandas series with True / False values and the original index of the DataFrame.

Then you can use that boolean series on the original DataFrame and get the subset you are looking for:

df[df['colname'] > somenumberIchoose]

should be enough.

See http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

`df[df['colname'] > somenumberIchoose].index` is the same as R which() function — Chris, Oct 27 '21 at 18:53

Dunes · Answer 2 · 2014-08-01T18:32:05.260

8

What what I know of R you might be more comfortable working with numpy -- a scientific computing package similar to MATLAB.

If you want the indices of an array who values are divisible by two then the following would work.

arr = numpy.arange(10)
truth_table = arr % 2 == 0
indices = numpy.where(truth_table)
values = arr[indices]

It's also easy to work with multi-dimensional arrays

arr2d = arr.reshape(2,5)
col_indices = numpy.where(arr2d[col_index] % 2 == 0)
col_values = arr2d[col_index, col_indices]

edited Aug 01 '14 at 18:32

answered Aug 01 '14 at 18:18

Dunes

37,291
7
81
97

2

+1 for a solution much closer to the R idiom. Also I don't like to turn everything into a pandas dataframe. – horaceT Apr 20 '18 at 18:49

score 3 · Answer 3 · answered Aug 01 '14 at 18:05

enumerate() returns an iterator that yields an (index, item) tuple in each iteration, so you can't (and don't need to) call .index() again.

Furthermore, your list comprehension syntax is wrong:

indexfuture = [(index, x) for (index, x) in enumerate(df['colname']) if x > yesterday]

Test case:

>>> [(index, x) for (index, x) in enumerate("abcdef") if x > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]

Of course, you don't need to unpack the tuple:

>>> [tup for tup in enumerate("abcdef") if tup[1] > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]

unless you're only interested in the indices, in which case you could do something like

>>> [index for (index, x) in enumerate("abcdef") if x > "c"]
[3, 4, 5]

There's no need to use `enumerate()`, it's inefficient, and even if there was a need, pd.DataFrame has `iterrows()` for exactly that purpose. — smci, Nov 27 '16 at 17:29

score 0 · Answer 4 · answered Jan 20 '16 at 21:22

0

And if you need an additional statement panda.Series allows you to do Operations between Series (+, -, /, , *).

Just multiplicate the indexes:

idx1 = df['lat'] == 49
idx2 = df['lng'] > 15 
idx = idx1 * idx2

new_df = df[idx]

answered Jan 20 '16 at 21:22

Manuel

2,334
4
20
36

score 0 · Answer 5 · answered Mar 30 '16 at 21:58

0

Instead of enumerate, I usually just use .iteritems. This saves a .index(). Namely,

[k for k, v in (df['c'] > t).iteritems() if v]

Otherwise, one has to do

df[df['c'] > t].index()

This duplicates the typing of the data frame name, which can be very long and painful to type.

answered Mar 30 '16 at 21:58

wdwd

73
1
6

I think it's just `df.index`, an attribute, not a function. I get an error that `'Int64Index' object is not callable` with `index()`. That said, both of these actually answer how one can do what `which()` does, so I like that! – Hendy Aug 27 '17 at 17:16

score 0 · Answer 6 · edited Aug 29 '18 at 00:44

0

A nice simple and neat way of doing this is the following:

SlicedData1 = df[df.colname>somenumber]]

This can easily be extended to include other criteria, such as non-numeric data:

SlicedData2 = df[(df.colname1>somenumber & df.colname2=='24/08/2018')]

And so on...

edited Aug 29 '18 at 00:44

Joel

1,564
7
12
20

answered Aug 28 '18 at 21:44

Adr

1

Pandas Equivalent of R's which()

6 Answers6

Linked