Is it possible to find index positions of certain rows in a big Dataframe (80000 rows, 6 columns) using iloc method?

Question

My data frame has six columns of float data and around 80,000 rows. One of the column is "Current" and sometimes it is negative. I wanted to find index locations when the "Current" value is negative. My code is given below:

currnet_index = my_df[(my_df["Current"]<0)].index.tolist()
print(current_index[:5])

This gives output as I wanted:

[0, 124, 251, 381, 512]

This is fine. Is it possible to write this code using iloc method? I tried with following code and but it is giving error. I am wondering to know which of them is best and fastest method?

current_index = my_df.iloc[(my_df["Current"]<0)]

The output is:

NotImplementedError: iLocation based boolean indexing on an integer type is not available

https://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation-how-are-they-different iloc gives you rows based on the location given..it wont return indexes...also as the error says it expects integers whereas the condition is giving boolean values..to make sense add .index after the () — iamklaus, Sep 09 '18 at 13:43
@SarthakNegi, I just tried this `current_index = my_df.iloc[(my_df["Current"]<0).index]`. It gives error. — Msquare, Sep 09 '18 at 13:52
your objective was to get the indexes right ?...also can you post the error message because it works fine for me... — iamklaus, Sep 09 '18 at 14:04
If the indices are not unique identifiers you can simply use `np.where` — Bharath M Shetty, Sep 09 '18 at 14:30

jpp · Accepted Answer · 2018-09-09T14:38:02.163

1

With iloc you need to use a Boolean array rather than a Boolean series. For this, you can use pd.Series.values. Here's a demo:

df = pd.DataFrame({'Current': [1, 3, -4, 9, -3, 1, -2]})

res = df.iloc[df['Current'].lt(0).values].index

# Int64Index([2, 4, 6], dtype='int64')

Incidentally, loc works with either an array or a series.

edited Sep 09 '18 at 14:38

answered Sep 09 '18 at 14:07

jpp

159,742
34
281
339

Thanks. It did work. I am wondering how to find the execution time of these two methods? I mean, which is effective and best code to use? Please, let me. – Msquare Sep 10 '18 at 05:20
@Msquare, You can use `timeit`, e.g. [see here](https://stackoverflow.com/questions/8220801/how-to-use-timeit-module). I do not expect you to see a significant performance difference; you should first check whether this is truly the bottleneck in your application. – jpp Sep 10 '18 at 08:38

score 0 · Answer 2 · answered Sep 09 '18 at 13:47

0

You can simply use the following

my_df.ix[my_df['Current']<0].index.values

answered Sep 09 '18 at 13:47

Anant Gupta

1,090
11
11

It gave output with a warning: `.ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing` – Msquare Sep 09 '18 at 13:55
In that case use .loc instead of .ix – Anant Gupta Sep 09 '18 at 14:10

Is it possible to find index positions of certain rows in a big Dataframe (80000 rows, 6 columns) using iloc method?

2 Answers2