Expand data.frame rows by criteria

Question

I'd like to know if it's possible to use dplyr to expand the rows of a data.frame based on criteria in each row. If it's not possible in dplyr I'd be happy for any solution!

Here is a sample of my data

data.frame(plot=rep(c(6,7),each=4),
           trans=rep(c("0,0","0,100","100,100","100,0"),2),
           length_m=c(350,200,200,50,45,200,125,75)        )

plot   trans length_m
6     0,0      350
6   0,100      200
6 100,100      200
6   100,0       50
7     0,0       45
7   0,100      200
7 100,100      125
7   100,0       75

The data above represent two plots. In general each of my plots plot has 1 to 4 transects, identified by 0,0; 0,100; 100,100; or 100,0 (the plots above both have all four possible transects). Each transect has a length given by length_m. What I'd like to do is to divide each transect by length L, and make one row for each new transect. If the final transect is < L, then that distance should be added to the previous transect.

So, if L = 100, the above dataset would look like this

plot    trans length_m
6       0,0_0      100
6     0,0_100      100
6     0,0_200      150
6     0,100_0      100
6   0,100_100      100
6   100,100_0      100
6 100,100_100      100
6     100,0_0       50
7       0,0_0       45
7     0_100_0      100
7   0,100_100      100
7   100,100_0      125
7     100,0_0       75

Note that 6, 0,0, which was 350 meters long, is split into sections 0,100 & 200, with lengths 100,100 & 150, while 6, 100,0 which was 50 meters long is just a single section 0 and is still 50 meters long.

I've tried a couple of different ways to make this work but nothing that is worth posting, so any help would be much appreciated!

This is not difficult to achieve. You only need to group by `plot` and `trans`. But as a data.table user I'll leave answering to the dplyr fans since you seem to prefer that. — Roland, Jun 05 '14 at 15:38
Should the `sum(length)` be the same after the transformation? ie should the "0,0_300" line have length 50 because "0,0" was length 350 before but now it's 450? Shouldn't all the lengths be less than L after this transformation? — MrFlick, Jun 05 '14 at 16:09
Hi MrFlick - Great catch! Sorry I messed up the desired output. Yes you are correct sum(length) should be the same after the transformation. I've fixed the output, hopefully that makes more sense. All lengths might not be less than L. This is because if the end of the transect has a residual at the end that is less than L, it should be added to the previous transect. This is why the 350 m transect is split into sections of length 100,100,150. Hope that is helpful! — Ben Carlson, Jun 05 '14 at 18:16

score 2 · Accepted Answer · answered Jun 05 '14 at 19:44

Here's a data table solution, assuming your original data is in a data frame df.

df$trans <- as.character(df$trans)   # need trans to be char, not factor
library(data.table)
dt <- data.table(df)         
L <- 100
f <- function(x) {                   # implements the partitioning
  if (x<L) return(x)
  y <- rep(L,as.integer(x/L))
  y[length(y)] <- y[length(y)]+x-sum(y)
  return(y)
}
result <- dt[,list(length_m=f(length_m)),by=list(plot,trans)]
result[,trans:=paste(trans,L*(0:(.N-1)),sep="_"),by=list(plot,trans)]
result
#     plot       trans length_m
#  1:    6       0,0_0      100
#  2:    6     0,0_100      100
#  3:    6     0,0_200      150
#  4:    6     0,100_0      100
#  5:    6   0,100_100      100
#  6:    6   100,100_0      100
#  7:    6 100,100_100      100
#  8:    6     100,0_0       50
#  9:    7       0,0_0       45
# 10:    7     0,100_0      100
# 11:    7   0,100_100      100
# 12:    7   100,100_0      125
# 13:    7     100,0_0       75

score 0 · Answer 2 · edited May 23 '17 at 12:30

Here's a dplyr solution - not very elegant though.

df <- data.frame(plot=rep(c(6,7),each=4),
           trans=rep(c("0,0","0,100","100,100","100,0"),2),
           length_m=c(350,200,200,50,45,200,125,75)        )

df %>% 
  mutate(rnum = row_number(),
         freq = pmax(floor(length_m/100),1)) %>%
  group_by(rnum) %>% complete(
    freq = 1:freq
  ) %>% mutate_all(
    funs(last(.))
  ) %>% mutate(
    within.rnum = row_number(),
    trans = paste0(trans,"_",100*(within.rnum-1)),
    length_m = ifelse(within.rnum==n(),(length_m - 100*(floor(length_m/100))) + 100*(length_m>100),100)
  ) %>% ungroup %>% select(-rnum,-within.rnum,-freq)

# Source: local data frame [13 x 4]
# Groups: rnum [6]
# 
# rnum  freq         x         y
# <int> <dbl>     <dbl>     <dbl>
#   1      1     1 0.8894632 1.4368569
# 2      2     1 0.4325821 0.9366039
# 3      3     2 0.2039089 0.6234862
# 4      3     2 0.2039089 0.6234862
# 5      4     2 0.9493441 1.5977998
# 6      4     2 0.9493441 1.5977998
# 7      5     3 0.9806209 1.7840731
# 8      5     3 0.9806209 1.7840731
# 9      5     3 0.9806209 1.7840731
# 10     6     4 0.8778605 1.4682580
# 11     6     4 0.8778605 1.4682580
# 12     6     4 0.8778605 1.4682580
# 13     6     4 0.8778605 1.4682580

For "expanding" datasets in dplyr I found the following group_by row_number() then apply complete() structure to work, e.g.:

df <- data_frame(x=runif(n=6),y=x+runif(n=6),freq=c(1,1,2,2,3,4))

df %>% mutate(rnum = row_number()) %>%
  group_by(rnum) %>% complete(
    freq = 1:freq
  ) %>% mutate_all(
    funs(last(.))
  )

# Source: local data frame [13 x 4]
# Groups: rnum [6]
# 
# rnum  freq         x         y
# <int> <dbl>     <dbl>     <dbl>
#   1      1     1 0.8894632 1.4368569
# 2      2     1 0.4325821 0.9366039
# 3      3     2 0.2039089 0.6234862
# 4      3     2 0.2039089 0.6234862
# 5      4     2 0.9493441 1.5977998
# 6      4     2 0.9493441 1.5977998
# 7      5     3 0.9806209 1.7840731
# 8      5     3 0.9806209 1.7840731
# 9      5     3 0.9806209 1.7840731
# 10     6     4 0.8778605 1.4682580
# 11     6     4 0.8778605 1.4682580
# 12     6     4 0.8778605 1.4682580
# 13     6     4 0.8778605 1.4682580

Though I think there are simple ways to do this using base R (e.g. answers to Replicate each row of data.frame and specify the number of replications for each row).

Expand data.frame rows by criteria

2 Answers2