In the first dataframe, the last two columns (shift_one and shift_two) can be thought of as a guess of a potential true coordinate. Call this df1.
df1:
p_one p_two dist shift_one shift_two
0 Q8_CB Q2_C d_6.71823_Angs 26.821 179.513
1 Q8_CD Q2_C d_4.72003_Angs 179.799 179.514
....
In the second dataframe, call this df2, I have a dataframe of experimental observed coordinates which I denote peaks. It simply is just the coordinates and one more column that is for how intense the signal was, this just needs to be along for the ride.
df2:
A B C
0 31.323 25.814 251106
1 26.822 26.083 690425
2 27.021 179.34 1409596
3 54.362 21.773 1413783
4 54.412 20.163 862750
....
I am aiming to have a method for each guess in df1 to be queried/searched/refrenced in df2, within a range of 0.300 of the initial guess in df1. I then want this to be returned in a new datframe, lets say df3. In this case, we notice there is a match in row 0 of df1 with row 2 of df2.
desired output, df3:
p_one p_two dist shift_one shift_two match match1 match2 match_inten
0 Q8_CB Q2_C d_6.71823_Angs 26.821 179.513 TRUE 27.021 179.34 1409596
1 Q8_CD Q2_C d_4.72003_Angs 179.799 179.514 NaN NaN NaN NaN
....
I have attempted a few things:
(1) O'Reily suggests dealing with bounds in a list in python by using lambda or def (p 78 of python in a nutshell). So I define a bound function like this.
def bounds (value, l=low, h=high)
I was then thinking that I could just add a new column, following the logic used here (https://stackoverflow.com/a/14717374/3767980).
df1['match'] = ((df2['A'] + 0.3 <= df1['shift_one']) or (df2['A'] + 0.3 => df1['shift_one'])
--I'm really struggling with this statement
Next I would just pull the values, which should be trivial.
(2) make new columns for the upper and lower limit, then run a conditional to see if the value is between the two columns.
Finally:
(a) Do you think I should stay in pandas? or should I move over to NumPy or SciPy or just traditional python arrays/lists. I was thinking that a regular python lists of lists too. I'm afraid of NumPy since I have text too, is NumPy exclusive to numbers/matrices only.
(b) Any help would be appreciated. I used biopython for phase_one and phase_two, pandas for phase_three, and I'm not quite sure for this final phase here what is the best library to use.
(c) It is probably fairly obvious that I'm an amateur programer.