I find this a perfect case of using factor
and setting levels
carefully. I'll use data.table
here with this idea. Make sure your value
column is character
(not an absolute requirement).
step 1: Get your data.frame
converted to data.table
by taking just unique
rows.
require(data.table)
dt <- as.data.table(unique(df))
setkey(dt, "depth") # just to be sure before factoring "value"
step 2: Convert value
to a factor
and coerce to numeric
. Make sure to set the levels yourself (it is important).
dt[, id := as.numeric(factor(value, levels = unique(value)))]
step 3: Set key column to depth
for subsetting and just pick the last value
setkey(dt, "depth", "id")
dt.out <- dt[J(unique(depth)), mult="last"][, value := NULL]
# depth id
# 1: 1 2
# 2: 2 2
# 3: 3 3
step 4: Since all values in the rows with increasing depth should have at least the value of the previous row, you should use cummax
to get the final output.
dt.out[, id := cummax(id)]
Edit: The above code was for illustrative purposes. In reality you don't need a 3rd column at all. This is how I'd write the final code.
require(data.table)
dt <- as.data.table(unique(df))
setkey(dt, "depth")
dt[, value := as.numeric(factor(value, levels = unique(value)))]
setkey(dt, "depth", "value")
dt.out <- dt[J(unique(depth)), mult="last"]
dt.out[, value := cummax(value)]
Here's a more tricky example and the output from the code:
df <- structure(list(depth = c(1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 6),
value = structure(c(1L, 2L, 3L, 4L, 1L, 3L, 4L, 5L, 6L, 1L, 1L),
.Label = c("a", "b", "c", "d", "f", "g"), class = "factor")),
.Names = c("depth", "value"), row.names = c(NA, -11L),
class = "data.frame")
# depth value
# 1: 1 2
# 2: 2 4
# 3: 3 4
# 4: 4 5
# 5: 5 6
# 6: 6 6