R: Delete incomplete data sequences

Question

I have logged measurements of three Soil CO2 sampling points coming to one measuring machine though three valves. There are three measurements at each valve (from each sampling point). The machine was sun powered so there was many power losses during measurement. When the power came back the measurements started again with Valve 1 and Measurement 1. Sometimes, but not always (e.g. row #27), there is a row with batt. batt. batt. sign (showing empty batteries). I would like to choose only data with perfect sequences from 1. to 2. valve.

I have data that look slike this:

# Measurement   Valve   Value
#1         1        1   0.123
#2         2        1   0.124
#3         3        1   0.125
#4         1        2   0.126
#5         2        2   0.127
#6         3        2   0.128
#7         1        3   0.129
#8         2        3   0.13
#9         3        3   0.131
#10    batt.    batt.   batt.
#11        1        1   0.132
#12        2        1   0.133
#13        3        1   0.134
#14        1        2   0.135
#15        2        2   0.136
#16        3        2   0.137
#17        1        3   0.138
#18        2        3   0.139
#19     bat.     bat.    bat.
#20        1        1   0.141
#21     bat.     bat.    bat.
#22        1        1   0.141
#23        2        1   0.142
#24        3        1   0.143
#25     bat.     bat.    bat.
#26        1        1   0.141
#27        1        1   0.141
#28        2        1   0.142
#29        3        1   0.143
#30        1        2   0.144
#31        2        2   0.145
#32        3        2   0.146
#33        1        3   0.147
#34        2        3   0.148
#35        3        3   0.149

I would like to get data like this:

#   Measurement Valve   Value
#1            1     1   0.123
#2            2     1   0.124
#3            3     1   0.125
#4            1     2   0.126
#5            2     2   0.127
#6            3     2   0.128
#7            1     1   0.132
#8            2     1   0.133
#9            3     1   0.134
#10           1     2   0.135
#11           2     2   0.136
#12           3     2   0.137
#13           1     1   0.141
#14           2     1   0.142
#15           3     1   0.143
#16           1     2   0.144
#17           2     2   0.145
#18           3     2   0.146

Input data:

DF = structure(list(Measurement = c("1", "2", "3", "1", "2", "3", 
"1", "2", "3", "batt.", "1", "2", "3", "1", "2", "3", "1", "2", 
"bat.", "1", "bat.", "1", "2", "3", "bat.", "1", "1", "2", "3", 
"1", "2", "3", "1", "2", "3"), Valve = c("1", "1", "1", "2", 
"2", "2", "3", "3", "3", "batt.", "1", "1", "1", "2", "2", "2", 
"3", "3", "bat.", "1", "bat.", "1", "1", "1", "bat.", "1", "1", 
"1", "1", "2", "2", "2", "3", "3", "3"), Value = c("0.123", "0.124", 
"0.125", "0.126", "0.127", "0.128", "0.129", "0.13", "0.131", 
"batt.", "0.132", "0.133", "0.134", "0.135", "0.136", "0.137", 
"0.138", "0.139", "bat.", "0.141", "bat.", "0.141", "0.142", 
"0.143", "bat.", "0.141", "0.141", "0.142", "0.143", "0.144", 
"0.145", "0.146", "0.147", "0.148", "0.149")), .Names = c("Measurement", 
"Valve", "Value"), row.names = c(NA, -35L), class = "data.frame")

`yourdf[ yourdf == "bat." ] <- NA; yourdf[complete.cases(yourdf), ]` — Paulo E. Cardoso, Oct 20 '15 at 12:14
Can you explain how you approached this and where you got stuck? — Roland, Oct 20 '15 at 12:17
You can do this by using a loop by comparing the previous number to check if the 1-2-3 pattern is followed... the 'batt.' will also be taken care of... this way is not very efficient though — Gaurav, Oct 20 '15 at 12:24
I don't know why no one has asked yet, but an easily reproducible example would be nice, one that can be copy-pasted into R. Some guidance: http://stackoverflow.com/a/28481250/1191259 Also, if would be good, as Roland said, to illustrate what you've tried so far so it's possible to help you with whatever problem you've run into along the way. There are plenty of ways to achieve your goal... — Frank, Oct 20 '15 at 12:38
did you mistype 'batt.'? the logger return both 'bat.' and batt.'? — Paulo E. Cardoso, Oct 20 '15 at 12:38
Can you confirm the desired output? Per question you state "I would like to choose only data with perfect sequences from 1. to 2. valve." Yet, in the desired output, you've eliminated duplicates. — Minnow, Oct 20 '15 at 13:11
Well I am wery new in R so I did not try a lot of posibilites, but I have been searching this and other sites for simillar questions. I have been trying stuff like: setDT(a)[,.SD[.N>=3], by=Ventil] to get only perfecty measured valves, but than I got stuck how to choose the perfect sequence of valves and measurements. I tried to generate the sequences and merge to get NA-s but that only works for certain number of repetitions. I was thinking to put all the data in reverse order, than find the valve2 measurement3 and select that row and the next 5rows, but I dont know how to do that. — B. Krajnc, Oct 20 '15 at 13:28

Frank · Accepted Answer · 2015-10-20T13:18:45.033

Well, you could create a grouping column:

library(data.table)
setDT(DF)

DF[, g := cumsum(grepl("bat",DF$Value))]

    Measurement Valve Value g
 1:           1     1 0.123 0
 2:           2     1 0.124 0
 3:           3     1 0.125 0
 4:           1     2 0.126 0
 5:           2     2 0.127 0
 6:           3     2 0.128 0
 7:           1     3 0.129 0
 8:           2     3  0.13 0
 9:           3     3 0.131 0
10:       batt. batt. batt. 1
11:           1     1 0.132 1
12:           2     1 0.133 1
13:           3     1 0.134 1
14:           1     2 0.135 1
15:           2     2 0.136 1
16:           3     2 0.137 1
17:           1     3 0.138 1
18:           2     3 0.139 1
19:        bat.  bat.  bat. 2
20:           1     1 0.141 2
21:        bat.  bat.  bat. 3
22:           1     1 0.141 3
23:           2     1 0.142 3
24:           3     1 0.143 3
25:        bat.  bat.  bat. 4
26:           1     1 0.141 4
27:           1     1 0.141 4
28:           2     1 0.142 4
29:           3     1 0.143 4
30:           1     2 0.144 4
31:           2     2 0.145 4
32:           3     2 0.146 4
33:           1     3 0.147 4
34:           2     3 0.148 4
35:           3     3 0.149 4
    Measurement Valve Value g

and then select groups

DF2 <- DF[, if (all(c("1","2") %in% Valve)) unique(.SD[Valve %in% c("1","2")]), by = g]

    g Measurement Valve Value
 1: 0           1     1 0.123
 2: 0           2     1 0.124
 3: 0           3     1 0.125
 4: 0           1     2 0.126
 5: 0           2     2 0.127
 6: 0           3     2 0.128
 7: 1           1     1 0.132
 8: 1           2     1 0.133
 9: 1           3     1 0.134
10: 1           1     2 0.135
11: 1           2     2 0.136
12: 1           3     2 0.137
13: 4           1     1 0.141
14: 4           2     1 0.142
15: 4           3     1 0.143
16: 4           1     2 0.144
17: 4           2     2 0.145
18: 4           3     2 0.146

If rows 26 & 27 could've had different Values, you can select the first with unique(.SD[Valve %in% c("1","2")], by=c("Measurement", "Valve")).

It worked..exept that I cant get rid of the 26&27. I would like to have only 27 — B. Krajnc, Oct 21 '15 at 09:22
@B.Krajnc I'm not sure I get what you mean. There's only one such row (with `1 1 0.141`) in my result above, right? — Frank, Oct 21 '15 at 12:08
Well I will try again. As I said I am very new to R so a lot of stuff is sill unclear to me. But when I put my data trough: DF2 <- DF[, if (all(c("1","2") %in% Valve)) unique(.SD[Valve %in% c("1","2")]), by = g] I get two rows with 1 1 0.141 And I still did not figure it out how to use: unique(.SD[Valve %in% c("1","2")], by=c("Measurement", "Valve")) Should I substitute it in the previous command? — B. Krajnc, Oct 21 '15 at 15:18
Please try it with the `dput` data I added to your question. You should not see a repeated row in that case. If your data differs from that `dput`, please replace that part of your answer with your actual data. The link I mentioned before should help with that: http://stackoverflow.com/a/28481250/1191259 — Frank, Oct 21 '15 at 15:21
Thank You! Now it is working! I gues I misstiped something before! — B. Krajnc, Oct 22 '15 at 10:00

R: Delete incomplete data sequences

1 Answers1