0

I have a data frame in R with around 1000 observations and 2 variables namely row.m.z and row.retention.time, such as this:

structure(list(row.m.z = c(301.14, 196.10, 132.10, 
160.13, 146.12, 166.09, 357.28, 230.06, 
307.2099609, 112.0537033, 220.13, 113.00, 120.08, 
261.11, 182.08, 410.33, 212.85, 248.12, 
176.88, 321.18), row.retention.time = c(6.1, 1.46, 
0.77, 0.94, 2.42, 0.94, 16.74, 1.61, 13.76, 1.61, 7.67, 0.74, 
2.42, 3.91, 1.25, 16.76, 0.69, 3.38, 0.73, 12.97)), row.names = c(NA, 
20L), class = "data.frame")

which looks like

    row.m.z row.retention.time
1  301.1400               6.10
2  196.1000               1.46
3  132.1000               0.77
4  160.1300               0.94
5  146.1200               2.42
6  166.0900               0.94
7  357.2800              16.74
8  230.0600               1.61
9  307.2100              13.76
10 112.0537               1.61
11 220.1300               7.67
12 113.0000               0.74
13 120.0800               2.42
14 261.1100               3.91
15 182.0800               1.25
16 410.3300              16.76
17 212.8500               0.69
18 248.1200               3.38
19 176.8800               0.73
20 321.1800              12.97

I need to ask R to find rows combining two conditions: variable row.retention.time should be the same in both rows, while variable row.m.z. should be +/- 6 between rows. In other words, I need to find in my data observations with the same row.retention.time and a difference of 6 between row.m.z..

In the data example given above, the code should find two observations with row.retention.time of 0.94 and row.m.z 166.09 and 160.13

I have tried several grouping or filtering options but in all of them I need to specify a value or range of values for each variable, whereas what I want to do is to compare values in rows among each other.

Behnam Hedayat
  • 837
  • 4
  • 18
Julia Rst
  • 1
  • 2
  • Welcome to SO, JuliaRst! "Same" when it comes to floating point is subjective (this is the case in most programming languages due to storage of irrational numbers in a finite digital storage space). For this, I suggest you determine a tolerance, so that two rows are within (say) `1e-5` or so. However, we can't do anything with images, and they are discouraged (for data and code) for many reasons. Please read https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info, then [edit] your question and help us help you. Thanks! – r2evans May 25 '21 at 13:49
  • Hi Julia. You have explained now but one thing still remains unclear? Do you want only 2 of such rows from 1000 rows? If no, how would you like to have these grouped? I mean there can be more than 2 rows following the criteria in a single group? Can you provide a complete output you require for the sample given? – AnilGoyal May 25 '21 at 14:34
  • In the complete dataset of 1000 rows there will be more than 2 rows following the criteria and I would like to group each pair of rows by retention time. Also within the same group (rows with the same retention time) more than 2 rows could be found meeting the criteria. The output would be a filtered version of my data where only rows meeting the criteria are kept and grouped by retention time. For the sample given the output would be only one pair of rows: [4] m.z. 160.13, retention.time 0.94 and [6] m.z. 166.09, retention.time 0.94 – Julia Rst May 26 '21 at 13:46
  • I have tried different things using dplyr but these involved comparing consecutive rows within a column, whereas I need to compare all the values in the column to each other and find pairs with a difference of 6 – Julia Rst May 26 '21 at 13:47

0 Answers0