compare values of array with column in dataframe if equal get value of this row in other column

Question

I want to compare a column of a DataFrame with an array. If the value of the column and the array are equal, it shold save the value of this row and another column in a new array.

I got the problem that sometimes it can't find the row with the equal number even there is one.

import numpy as np
import pandas as pd

input1=np.arange(0.,1.,0.1)
output1=np.arange(1.,0.,-0.1)
df1= pd.DataFrame(columns=['input', 'output'])
df1['input']=input1
df1['output']=output1
in1=np.arange(0.9,0.,-0.1)
in2=np.arange(0.,0.9,0.1)
in_func=np.concatenate((in1, in2), axis=0) 

b=np.zeros((len(in_func)))

for i in range(len(in_func)):
    a = df1.loc[df1['input']==in_func[i], 'output']
    b[i] = a.iloc[0]  #just for explaining my problem

The output for a is:

9    0.1
Name: output, dtype: float64
8    0.2
Name: output, dtype: float64
7    0.3
Name: output, dtype: float64
6    0.4
Name: output, dtype: float64
Series([], Name: output, dtype: float64)
Series([], Name: output, dtype: float64)
Series([], Name: output, dtype: float64)
Series([], Name: output, dtype: float64)
Series([], Name: output, dtype: float64)
0    1.0
Name: output, dtype: float64
1    0.9
Name: output, dtype: float64
2    0.8
Name: output, dtype: float64
3    0.7
Name: output, dtype: float64
4    0.6
Name: output, dtype: float64
5    0.5
Name: output, dtype: float64
6    0.4
Name: output, dtype: float64
7    0.3
Name: output, dtype: float64
8    0.2
Name: output, dtype: float64

I get the error "IndexError: single positional indexer is out-of-bounds", because there are some empty series for in_func=[0.5, 0.4, 0.3, 0.2, 0.1]. I don't know why they are empty, the second time this values are in the in_func it is working.

Can someone help me? Thank you very much for your help.

score 0 · Accepted Answer · answered Aug 19 '19 at 15:34

0

The reason is arrange does not give consistent result for non-integer step. link

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases.

I ran your program with some additional prints to understand the same. hope this helps.

import numpy as np
import pandas as pd

in1=np.arange(0.,1.,0.1)
print (in1)
print (in1[0])
print (in1[1])
print (in1[2])
print (in1[3])
print (in1[4])
out1=np.arange(1.,0.,-0.1)
print (out1)

Output :
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
0.0
0.1
0.2
0.30000000000000004
0.4
[1.  0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1]

answered Aug 19 '19 at 15:34

dhanlin

145
7

Thank you for your quick answer. I solved the problem with numpy.linspace and numpy.around. Is there an easy way to just get the closest value? – fouk Aug 19 '19 at 15:57
can you explain a bit more? what do you mean by easy way? if this answer solves it is customary to accept the answer and upvote the same. – dhanlin Aug 19 '19 at 16:36
I'm sorry. Yeah this answer works for for this minimized example, but in my real code I don't have always exactly the same number in the dataframe and in the array, so get the same problems again. Rounding is not the nicest way and doesn't work always. So I was just wondering if there is a function in numpy to get the closest number or if I need to write my own function. – fouk Aug 19 '19 at 16:51
I found the answer of my question here: https://stackoverflow.com/questions/2566412/find-nearest-value-in-numpy-array. Thank you very much for your help. – fouk Aug 19 '19 at 17:57

compare values of array with column in dataframe if equal get value of this row in other column

1 Answers1