2

I've got a data frame. say,

data.frame(x = c(1, 3), y = c(5, 0), id = c("A", "B"))

And now I want to duplicate it so I have a duplicate in the same data.frame. I'd end up with something like this,

 data.frame(x = c(1, 3, 1, 3), y = c(5, 0, 5, 0), id = c("A", "B", "A", "B"))

Now, this is pretty close to what I want but I also want to append the id column to make them unique to each row based on the number of duplicates I want (in this case just one but I would want n many).

data.frame(x = c(1, 3, 1, 3), y = c(5, 0, 5, 0), id = c("A-1", "B-1", "A-2", "B-2"))

So, as you could see, I could wrap my head around making the objects, but I would like to move from doing "hacky" code with base R to replicating this functionality with dplyr.

cylondude
  • 1,816
  • 1
  • 22
  • 55
  • Not particularly hacky in base R: `out <- dat[rep(1:nrow(dat), 2),]; out$id <- paste(out$id, rep(1:2, each=nrow(dat)), sep="-")` – thelatemail Mar 20 '17 at 00:35
  • Also relevant - http://stackoverflow.com/questions/38237350/repeating-rows-of-data-frame-in-dplyr - `dat %>% slice(rep(1:n(), 2))` will get you most of the way, but it's arguable that is more complicated. – thelatemail Mar 20 '17 at 00:44
  • I like your solution. Do you think it is misguided to try and use dplyr for this problem? It looks like your solution is pretty easy to read and speedy. – cylondude Mar 20 '17 at 19:10
  • I'd struggle to see if you could do it a) simpler than the 2 lines I posted and b) much quicker for large data. In this case, base R seems perfectly acceptable. – thelatemail Mar 20 '17 at 22:17

1 Answers1

2

So I notice you want to do this with the dplyr package. I think using a combination of group_by(), mutate(), and row_number() functions from dplyr, you can make this work quite nicely.

library(dplyr)

# so you start with this data.frame:
df <- data.frame(x = c(1, 3), y = c(5, 0), id = c("A", "B"))

# to attach an exact duplication of this df to itself:
df <- rbind(df, df)


# group by id, add a second id to increment within each id group ("A", "B", etc.)
df2 <- group_by(df, id) %>%
    mutate(id2 = row_number())


# paste the id and id2 together for the desired result
df2$id_combined <- paste0(df2$id, '-', df2$id2)

# inspect results
df2
    # x     y     id   id2 id_combined
    # <dbl> <dbl> <fctr> <int>       <chr>
    # 1     1     5      A     1         A-1
    # 2     3     0      B     1         B-1
    # 3     1     5      A     2         A-2
    # 4     3     0      B     2         B-2

Keep in mind, you now have a "tibble" / "grouped data.frame" and not a basic data.frame.

It is simple to get it back to a raw data.frame if you prefer that.

df2 <- data.frame(df2, stringsAsFactors = F)

# now to remove the additional columns that were added in this process:
df2$id2 <- NULL

Edit -- exploring other options for attaching n replications of same data frame to itself:

# Not dplyr, but this is how I would normally handle this type of task:
df <- data.frame(x = c(1, 3), y = c(5, 0), id = c("A", "B"))

# set n equal to the number of times you want to replicate the data.frame
n <- 13

# initialize space to hold the data frames
list_dfs <- list()

# loop through, adding individual data frames to the list
for(i in 1:n) {
    list_dfs[[i]] <- df
}

# combine them all with do.call
my_big_df <- do.call(rbind, list_dfs)

From there, you can then use the group_by(), mutate(), and row_number() functions as I showed above to created your new unique key for the data.frame.

TaylorV
  • 846
  • 9
  • 13
  • So how would I use dplyr to replicate the original data.frame? – cylondude Mar 20 '17 at 00:46
  • If by replicate you mean attach an exact copy of the existing data.frame to itself, then I would just do `df <- rbind(df, df)`. Maybe I'm not understanding, was the ultimate goal to create that additional id field with the "A-1" type format? – TaylorV Mar 20 '17 at 00:51
  • I want to attach n exact copies with dplyr – cylondude Mar 20 '17 at 01:02
  • I'm not sure how to do that with dplyr, but I did edit my answer to show how that can be done with a list and a call to `do.call(rbind, )` – TaylorV Mar 20 '17 at 01:16
  • at this point I'm wondering if it was wrong of me to try and acomplish this with dplyr? The comment from @thelatemail in my question seems to answer it pretty succinctly. – cylondude Mar 20 '17 at 19:08