0

I have a list full of of data.frames with two columns, time and signal. The data.frames are the results of GC chromatographic analysis from a process that was periodically sampled.

I want to compare the gc data I've collected.

I've written a function to convert the times and peak areas into percentage areas (excluding solvent peak) and relative retention times.

Due to the nature of the process, different GCs have differing numbers of peaks and therefore comparison isn't straightforward. Impurities appear at different parts of my process and hence give extra peaks.

I want to go over my list and find the longest vector of relative retention times (no problem). I want to use the longest vector as a comparator and place NA values at the relative retention times that appear at the same time as the comparator but do not appear in the other data.frames.

Hence the results of the following list of relative retention times,

prac  <- list(a=c(0.203,0.305,0.444,0.780,1.000,1.101,1.403),
          b=c(0.201,0.306,0.442,0.778,1.000,1.101,1.208,1.401))

where b is the comparator vector, should look like

0.203 0.305 0.444 0.780 1.000 1.101 NA    1.403
0.201 0.306 0.442 0.778 1.000 1.101 1.208 1.401

Can anyone suggest how I might be able to start?

My first thought was a for loop but I don't think that will work. Please note that there are sometimes more than 1 NA values required.

(I plan to collate the percentage areas against the comparator relative retention times for all the chromatograms, if only I can get beyond this problem).

grrgrrbla
  • 2,529
  • 2
  • 16
  • 29
DarrenRhodes
  • 1,431
  • 2
  • 15
  • 29
  • 1
    I have no idea what it is you actually want, it seems to be simple, but behind all this retention comparator bla bla -fog i cant see the basic building blocks of your problem...; so: why is the 7th value of your list NA? what condition makes it NA? how do you wanna filter, select, apply a condition or whatever to your list prac?? I have no idea how you get from prac to the "comparator vector"??? – grrgrrbla Apr 29 '15 at 14:10
  • @grrgrrbla initial list, prac <- list(a=c(0.203,0.305,0.444,0.780,1.000,1.101,1.403), b=c(0.201,0.306,0.442,0.778,1.000,1.101,1.208,1.401)); since the longest vector is b, this is the comparator. I want to put an NA at the point where there isn't a corresponding similar number. this can be seen in the output here, 0.203 0.305 0.444 0.780 1.000 1.101 NA 1.403 0.201 0.306 0.442 0.778 1.000 1.101 1.208 1.401 both input and output have been copied and pasted from the text above which will most probably make a mess of the formatting. – DarrenRhodes Apr 29 '15 at 14:13
  • 1
    what does similar mean? exactly equal? or equal inside what bounds? because none of the values on the indizes of vectors a and b (except [5] and [6]) are exactly equal; a general tip: try to abstract away from your "special names" and just look at the abstract properties of your problem so people who have no idea how time-signal-processing works (like me) have an easier time helping you, nobody answered here for 1 hour, which is unusual on SO and a sign that you didnt state your problem clearly – grrgrrbla Apr 29 '15 at 14:16
  • @grrgrrbla equal inside bounds (the signal drifts from chromatogram to chromatogram; I can do the bounds). Further, I think my answer will be along the lines of http://stackoverflow.com/questions/18951248/insert-elements-in-a-vector-in-r and http://stackoverflow.com/questions/1493969/how-to-insert-elements-into-a-vector but I still need anyone else's input at the moment. – DarrenRhodes Apr 29 '15 at 14:20
  • 1
    which elements should be compared between the two vectors? the ones who are closest? – grrgrrbla Apr 29 '15 at 14:24
  • @grrgrrbla yes. The numbers represent peaks which represent impurities or product (1.000). The relative retention time drifts slightly and so it isn't possible to directly compare the elements of one vector with the other; instead, to use your phrase, I have to use bounds to compare the elements of the vectors with the comparator vector. – DarrenRhodes Apr 29 '15 at 14:41

1 Answers1

1

This is one solution (brute force) with just one missing value:

prac  <- list(a=c(0.203,0.305,0.444,0.780,1.000,1.101,1.403),
              b=c(0.201,0.306,0.442,0.778,1.000,1.101,1.208,1.401))


NA.index <- which(abs(prac$b[1:length(prac$a)] - prac$a) > 0.05)
newlist.a <- c(prac$a[1:NA.index-1], NA, prac$a[NA.index])

this here should be generizeable( depending on how your data actually is structured):

prac  <- list(a=c(0.203,0.305,0.444,0.780,1.000,1.101,1.403),
              b=c(0.201,0.306,0.442,0.778,1.000,1.101,1.208,1.401))

for(i in seq_along(prac$a)) {
    if(abs(prac$b[i] - prac$a[i]) < 0.05) {
        prac$a[i] <- prac$a[i]
    } else {
        prac$a[i+1] <- prac$a[i]
        prac$a[i] <- NA
    }
}

kinda hard to tell how to generalize this to examples with multiples NAs without you giving another reproducible example, because right now I am just tapping in the dark about how your data is structured

grrgrrbla
  • 2,529
  • 2
  • 16
  • 29
  • boundary +/- 0.005. Which elements in what order? The peak rrt 1.000 is the same for each chromatogram the peaks that spread out from that are to be compared; hence, prac$a[1:5] and prac$b[1:5] can be compared but prac$a[6:7] requires an NA to be comparable to prac$[6:8]. My problem is how to get that NA in ... If I can do that I should be able to extend the function to more difficult cases. – DarrenRhodes Apr 29 '15 at 14:50
  • please give me another example of there beeing more than 2 expected NA-values from you data – grrgrrbla Apr 29 '15 at 15:39
  • here's a dput output, structure(list(a = c(0.203, 0.305, 0.444, 0.78, 1, 1.101, 1.403 ), b = c(0.201, 0.306, 0.442, 0.778, 1, 1.101, 1.208, 1.401), d = c(0.201, 0.306, 0.778, 1, 1.101, 1.208, 1.401), e = c(0.201, 0.442, 0.778, 1, 1.101, 1.401), f = c(0.442, 0.778, 1, 1.101, 1.208, 1.401)), .Names = c("a", "b", "d", "e", "f")) where 'a' and 'b' where as before but d,e, and f have missing values in different parts of the sequence. (But you've done a lot to answer the question: in fact, you've answered the original question which I'll close over the next couple of days) – DarrenRhodes Apr 29 '15 at 21:32