Looping to subset data using depth criteria in r

Question

I am trying to write a looping function to subset these data below :

DEPTH   A       B       C       D       E       F
4700    8.75    9.313   0.12    0.138   89.164  72.613
4700.5  8.75    9.264   0.117   0.135   89.266  72.784
4701    8.75    9.376   0.112   0.132   89.52   68.443
4701.5  8.75    9.485   0.11    0.122   89.088  64.839
4702    8.75    9.564   0.116   0.108   89.377  64.388
4702.5  8.75    9.572   0.121   0.098   88.93   66.931
4703    8.75    9.524   0.122   0.093   89.651  70.906
4703.5  8.75    9.395   0.124   0.091   90.486  75.106
4704    8.75    9.245   0.123   0.089   90.598  77.443
4704.5  8.75    9.298   0.124   0.087   91.251  78.93
4705    8.75    9.361   0.125   0.088   90.319  77.159
4705.5  8.75    9.454   0.123   0.088   88.176  75.999
4706    8.75    9.448   0.124   0.088   86.129  78.843
4706.5  8.75    9.359   0.124   0.096   85.581  77.067
4707    8.75    9.305   0.119   0.12    85.082  73.191
4707.5  8.75    9.16    0.113   0.16    85.738  78.425
4708    8.75    9.036   0.097   0.208   86.114  91.491
4708.5  8.75    9.126   0.089   0.237   89.779  97.706
4709    8.75    9.111   0.094   0.224   92.429  91.557
4709.5  8.75    9.119   0.106   0.195   91.663  85.642
4710    8.75    9.234   0.143   0.185   91.881  83.705
4710.5  8.75    9.468   0.172   0.172   92.526  82.094
4711    8.75    9.59    0.187   0.139   94.544  85.973
4711.5  8.75    9.364   0.304   0.106   97.261  88.345
4712    8.75    9.145   0.458   0.089   98.726  78.622
4712.5  8.75    8.97    0.463   0.071   99.372  74.403
4713    8.75    8.985   0.384   0.064   99.343  82.743
4713.5  8.75    9.021   0.321   0.098   98.377  89.484
4714    8.75    9.148   0.247   0.133   95.209  93.148
4714.5  8.75    9.352   0.181   0.129   87.194  99.743
4715    8.75    9.427   0.147   0.104   83.613  109.798

into:

subset1
    DEPTH   A       B       C       D       E       F
    4700    8.75    9.313   0.12    0.138   89.164  72.613
    4700.5  8.75    9.264   0.117   0.135   89.266  72.784
    4701    8.75    9.376   0.112   0.132   89.52   68.443
    4701.5  8.75    9.485   0.11    0.122   89.088  64.839
    4702    8.75    9.564   0.116   0.108   89.377  64.388
    4702.5  8.75    9.572   0.121   0.098   88.93   66.931
    4703    8.75    9.524   0.122   0.093   89.651  70.906

subset2
    DEPTH   A       B       C       D       E       F
    4703    8.75    9.524   0.122   0.093   89.651  70.906
    4703.5  8.75    9.395   0.124   0.091   90.486  75.106
    4704    8.75    9.245   0.123   0.089   90.598  77.443
    4704.5  8.75    9.298   0.124   0.087   91.251  78.93
    4705    8.75    9.361   0.125   0.088   90.319  77.159
    4705.5  8.75    9.454   0.123   0.088   88.176  75.999
    4706    8.75    9.448   0.124   0.088   86.129  78.843

subset3
    DEPTH   A       B       C       D       E       F
    4706    8.75    9.448   0.124   0.088   86.129  78.843
    4706.5  8.75    9.359   0.124   0.096   85.581  77.067
    4707    8.75    9.305   0.119   0.12    85.082  73.191
    4707.5  8.75    9.16    0.113   0.16    85.738  78.425
    4708    8.75    9.036   0.097   0.208   86.114  91.491
    4708.5  8.75    9.126   0.089   0.237   89.779  97.706
    4709    8.75    9.111   0.094   0.224   92.429  91.557

subset4
    DEPTH   A       B       C       D       E       F
    4709    8.75    9.111   0.094   0.224   92.429  91.557
    4709.5  8.75    9.119   0.106   0.195   91.663  85.642
    4710    8.75    9.234   0.143   0.185   91.881  83.705
    4710.5  8.75    9.468   0.172   0.172   92.526  82.094
    4711    8.75    9.59    0.187   0.139   94.544  85.973
    4711.5  8.75    9.364   0.304   0.106   97.261  88.345
    4712    8.75    9.145   0.458   0.089   98.726  78.622

subset5
    DEPTH   A       B       C       D       E       F
    4712    8.75    9.145   0.458   0.089   98.726  78.622
    4712.5  8.75    8.97    0.463   0.071   99.372  74.403
    4713    8.75    8.985   0.384   0.064   99.343  82.743
    4713.5  8.75    9.021   0.321   0.098   98.377  89.484
    4714    8.75    9.148   0.247   0.133   95.209  93.148
    4714.5  8.75    9.352   0.181   0.129   87.194  99.743
    4715    8.75    9.427   0.147   0.104   83.613  109.798

Can someone please help me?, I have no luck to find the way so far. I need to subset the data for every 3 foot interval.

Also, is it correct that the first row of one subset is the same as the last row of the previous subset? — SymbolixAU, Mar 23 '16 at 02:51

score 0 · Answer 1 · answered Mar 23 '16 at 03:20

The cut function cuts a continuous vector into a factor of levels determined by the breaks you pass it. In this case, you just need a sequence that runs from min(df$DEPTH) to max(df$DEPTH) by 3, so pass that to cut:

df$breaks <- cut(df$DEPTH, seq(min(df$DEPTH), max(df$DEPTH), by = 3), include.lowest = TRUE)
head(df)
#    DEPTH    A     B     C     D      E      F      breaks
# 1 4700.0 8.75 9.313 0.120 0.138 89.164 72.613 [4700,4703]
# 2 4700.5 8.75 9.264 0.117 0.135 89.266 72.784 [4700,4703]
# 3 4701.0 8.75 9.376 0.112 0.132 89.520 68.443 [4700,4703]
# 4 4701.5 8.75 9.485 0.110 0.122 89.088 64.839 [4700,4703]
# 5 4702.0 8.75 9.564 0.116 0.108 89.377 64.388 [4700,4703]
# 6 4702.5 8.75 9.572 0.121 0.098 88.930 66.931 [4700,4703]

Now we've got a column by which to divide our data, so we can split it into separate data.frames by subsetting within lapply, removing the added column at the same time with -ncol(df), as it was the last column added:

lapply(levels(df$breaks), function(x){df[df$breaks == x, -ncol(df)]})
# [[1]]
#    DEPTH    A     B     C     D      E      F
# 1 4700.0 8.75 9.313 0.120 0.138 89.164 72.613
# 2 4700.5 8.75 9.264 0.117 0.135 89.266 72.784
# 3 4701.0 8.75 9.376 0.112 0.132 89.520 68.443
# 4 4701.5 8.75 9.485 0.110 0.122 89.088 64.839
# 5 4702.0 8.75 9.564 0.116 0.108 89.377 64.388
# 6 4702.5 8.75 9.572 0.121 0.098 88.930 66.931
# 7 4703.0 8.75 9.524 0.122 0.093 89.651 70.906
# 
# [[2]]
#     DEPTH    A     B     C     D      E      F
# 8  4703.5 8.75 9.395 0.124 0.091 90.486 75.106
# 9  4704.0 8.75 9.245 0.123 0.089 90.598 77.443
# 10 4704.5 8.75 9.298 0.124 0.087 91.251 78.930
# 11 4705.0 8.75 9.361 0.125 0.088 90.319 77.159
# 12 4705.5 8.75 9.454 0.123 0.088 88.176 75.999
# 13 4706.0 8.75 9.448 0.124 0.088 86.129 78.843
# .....

Alternately, you can wrap the process all into one by cutting inside of split, but you won't be able to remove the new column as simply.

split(df, cut(df$DEPTH, seq(min(df$DEPTH), max(df$DEPTH), by = 3), include.lowest = TRUE))
# $`[4700,4703]`
#    DEPTH    A     B     C     D      E      F      breaks
# 1 4700.0 8.75 9.313 0.120 0.138 89.164 72.613 [4700,4703]
# 2 4700.5 8.75 9.264 0.117 0.135 89.266 72.784 [4700,4703]
# 3 4701.0 8.75 9.376 0.112 0.132 89.520 68.443 [4700,4703]
# 4 4701.5 8.75 9.485 0.110 0.122 89.088 64.839 [4700,4703]
# 5 4702.0 8.75 9.564 0.116 0.108 89.377 64.388 [4700,4703]
# 6 4702.5 8.75 9.572 0.121 0.098 88.930 66.931 [4700,4703]
# 7 4703.0 8.75 9.524 0.122 0.093 89.651 70.906 [4700,4703]
# 
# $`(4703,4706]`
#    DEPTH    A     B     C     D      E      F      breaks
# 8 4703.5 8.75 9.395 0.124 0.091 90.486 75.106 (4703,4706]
# . ......

thanks alistaire. however by this way, the second subset does not start from the last depth of previous subset. any idea how to solve it? — Shuang, Mar 23 '16 at 04:35

score 0 · Answer 2 · edited May 23 '17 at 11:59

A couple of data.table solutions:

foverlaps

We can use foverlaps in library(data.table)

Convert your data.frame into a data.table

setDT(dt)    ## put your df in here

Create a table of 'lookup depths' to give us our intervals

dt_depths <- data.table(depths_min = seq(min(dt$DEPTH), max(dt$DEPTH), by=3),
                        depths_max = seq(min(dt$DEPTH) + 3, max(dt$DEPTH) + 3, by=3))

#    depths_min depths_max
# 1:       4700       4703
# 2:       4703       4706
# 3:       4706       4709
# 4:       4709       4712
# 5:       4712       4715
# 6:       4715       4718

Set up the data for the foverlaps join

dt <- dt[, .(A, B, C, D, E, F, DEPTH)]  ## re-order for foverlap
dt[, DEPTH_copy := DEPTH]

setkey(dt_depths, depths_min, depths_max)
setkey(dt, DEPTH, DEPTH_copy)

## do the join
dt_join <- foverlaps(dt, 
                     dt_depths, 
                     type="any") ## any - to allow those on the 'borders' into both groups

We can add 'groups/subsets' if you wish:

dt_join[, subset := rleid(depths_min, depths_max)]


#    depths_min depths_max    A     B     C     D      E       F  DEPTH DEPTH_copy subset
# 1:       4700       4703 8.75 9.313 0.120 0.138 89.164  72.613 4700.0     4700.0      1
# 2:       4700       4703 8.75 9.264 0.117 0.135 89.266  72.784 4700.5     4700.5      1
# 3:       4700       4703 8.75 9.376 0.112 0.132 89.520  68.443 4701.0     4701.0      1
# 4:       4700       4703 8.75 9.485 0.110 0.122 89.088  64.839 4701.5     4701.5      1
# 5:       4700       4703 8.75 9.564 0.116 0.108 89.377  64.388 4702.0     4702.0      1
# 6:       4700       4703 8.75 9.572 0.121 0.098 88.930  66.931 4702.5     4702.5      1
# 7:       4700       4703 8.75 9.524 0.122 0.093 89.651  70.906 4703.0     4703.0      1
# 8:       4703       4706 8.75 9.524 0.122 0.093 89.651  70.906 4703.0     4703.0      2
# 9:       4703       4706 8.75 9.395 0.124 0.091 90.486  75.106 4703.5     4703.5      2
# 10:      4703       4706 8.75 9.245 0.123 0.089 90.598  77.443 4704.0     4704.0      2
...

And split it if you want it as a list

split(dt_join, dt_join$subset)

EACHI

Making use of this answer

## create a 'key_col' column to join on
dt_depths[, key_col := 1][ dt[, key_col := 1] , 
                                            { idx = depths_min <= i.DEPTH & i.DEPTH <= depths_max
                                            .(DEPTH = i.DEPTH,
                                            depths_min = depths_min[idx],
                                            depths_max = depths_max[idx],
                                            A = i.A,
                                            B = i.B,
                                            C = i.C,
                                            D = i.D,
                                            E = i.E,
                                            F = i.F)
                                            }, 
                                            on = "key_col",
                                            by=.EACHI]

subsets can then be identified using the same method in the foverlaps solution

Looping to subset data using depth criteria in r

2 Answers2