0

I am trying to do a data analysis, where I import data as numpy array of floats, where some values are below 0. Then I select a column named load and I want to find an indices where the values are >0.1

However I am getting an error: "only integers, slices (\`:\`), ellipsis (\`...\`), numpy.newaxis (\`None\`) and integer or boolean arrays are valid indices"

what am I doing wrong please?


import numpy as np
import pandas as pd

data=pd.read_csv('C1.txt',delim_whitespace=True , 
                 skiprows=10, skip_blank_lines=True ) 
data_array=data.to_numpy()
load=data_array[10:,1]

res=list()
for idx in load:
    if load[idx] > 0.1:
        res.append(idx)

i need to find indices in array where values are over 0.1

the start of the data array looks like this:

0.063   -0.00174    0.063   -0.00075
0.094   0.00628 0.094   -0.00089
0.125   0.01292 0.125   -0.00111
0.156   -0.00027    0.156   0.00015
0.188   -0.00319    0.188   0.00108
0.219   -0.00733    0.219   -0.0007
0.25    -0.02446    0.25    -0.00074
0.281   -0.01493    0.281   -0.00078
0.313   0.01339 0.313   0.00019
hacker315
  • 1,996
  • 2
  • 13
  • 23

2 Answers2

2

This is done by indexing with a mask.

With numpy, since that is what you've asked for

np.where(load>0.1)[0]

[0] because it returns a tuple.

Note that I've ignored the 10: part of load since I don't know why you want to skip the 1st 10 rows. But what is certain is that you cannot index an array of 100 values with 90 booleans. And same would goes for your method (a for loop). You can't iterate subarray [10:,1] and expect to find index consistent with the full array.

So, if, for some reason you just want to ignore the 1st 10 rows (for example because you know that in your file they are just rubish), then

load=data_array[10:,1]
np.where(load>0.1)[0]+10

Here I add 10, to take into account the fact that index is on subarray starting at 10

With pandas directly

data.index[data.load>0.1]

Explanation: data.load is the load column. data.load>0.1 is an series of boolean (so here, 100 boleans, since there are 100 rows; with the same index), True iff the load field of the corresponding row is >0.1. And data.index[data.load>0.1] is a the index column whose rows are only those for which data.load>0.1 is True.

In pure python

So with your method, once corrected

for i in range(len(load)):
    if load[i]>0.1: res.append(i)

If you really insist on avoiding the iteration with a range,

for i,v in enumerate(load):
    if v>0.1: res.append(i)

Or, using compound list

res=[i for i,v in enumerate(load) if v>0.1]

But that is not a good idea. The whole point of numpy/pandas is to avoid doing pure python for loops, and to get numpy/pandas do the for loop (therefore in C) for you. Of course, under the hood, what my numpy or pandas solution do is more or less the same for loop I do here. But they do it in C. So, some 1000 or even more times faster.

chrslg
  • 9,023
  • 5
  • 17
  • 31
  • thank you very much for this explanation it helps me to learn a lot... – Igor Moravčík Jul 21 '23 at 13:41
  • but for example how to you find out that there even is a functon for this like "where"? surelly you dont go throuh the whole documentaton – Igor Moravčík Jul 21 '23 at 13:42
  • You don't know it on you first day, sure. But `.where` is a quite classical one, so hard to avoid seeing it after reading a while. For example, I've just typed `numpy find index of boolean array` in google right now. And first answer is [this one](https://stackoverflow.com/questions/16094563/numpy-get-index-where-value-is-true). Whose first answer does mention `np.where`. Sure, in a different context. But from there, you know you can read documentation for `np.where`. – chrslg Jul 21 '23 at 13:51
  • thank you again for the answer I wish everyone would explain it as well as you. have a nice day – Igor Moravčík Jul 21 '23 at 14:01
  • just last question where did you find to put [0] to "np.where(load>0.1)[0]". in the documentation there is no mention about [0] or anything https://numpy.org/doc/stable/reference/generated/numpy.where.html – Igor Moravčík Jul 21 '23 at 14:08
  • 1
    @IgorMoravčík As it says on the note at the top, if only the condition is given (as is in this case), the function is shorthand for using `nonzero`. Looking at that documentation shows that it will return a tuple. – jared Jul 21 '23 at 16:10
0

Edited:

idx here is not index but the item at that index. you can do the following to get the index:

for idx,item in enumerate(load):
    if item > 0.1:
        res.append(idx)
  • Note that hey do want the index, not the value (I made the same misreading initially). So that solutions works, in the sense that it doesn't crash. But returns a list a floating values, when they wanted a list of index of those floats in the initial array – chrslg Jul 21 '23 at 13:27
  • as you pointed out the code you mentioned does not wok for me, since I need indices rather than just value over 0.1. I want to use indices since I want t get rid of the first data that is below 0.1, however after that I want all data – Igor Moravčík Jul 21 '23 at 13:55
  • @IgorMoravčík I have edited my answer – Muhammad Waqar Anwar Jul 22 '23 at 15:33