2

I would like to duplicate a certain row based on information in a data frame. Prefer a tidyverse solution. I'd like to accomplish this without explicitly calling the original data frame in a function.

Here's a toy example.

data.frame(var1 = c("A", "A", "A", "B", "B"), 
           var2 = c(1, 2, 3, 4, 5), 
           val = c(21, 31, 54, 65, 76))
  var1 var2 val
1    A    1  21
2    A    2  31
3    A    3  54
4    B    4  65
5    B    5  76

All the solutions I've found so far require the user to input the desired row index. I'd like to find a way of doing it programmatically. In this case, I would like to duplicate the row where var1 is "A" with the highest value of var2 for "A" and append to the original data frame. The expected output is

  var1 var2 val
1    A    1  21
2    A    2  31
3    A    3  54
4    B    4  65
5    B    5  76
6    A    3  54
hmhensen
  • 2,974
  • 3
  • 22
  • 43

4 Answers4

3

A variation using dplyr. Find the max by group, filter for var1 and append.

library(dplyr)
df %>% 
  group_by(var1) %>% 
  filter(var2 == max(var2), 
         var1 == "A") %>% 
  bind_rows(df, .)

  var1 var2 val
1    A    1  21
2    A    2  31
3    A    3  54
4    B    4  65
5    B    5  76
6    A    3  54
neilfws
  • 32,751
  • 5
  • 50
  • 63
  • `df1` is a copy-paste error :) I always use `df1` for data frame names in my own code, changed it to `df` here but forgot the second one. – neilfws Aug 27 '20 at 04:09
1

You could select the row that you want to duplicate and add it to original dataframe :

library(dplyr)

var1_variable <- 'A'
df %>%
  filter(var1 == var1_variable) %>%
  slice_max(var2, n = 1) %>%
  #For dplyr < 1.0.0
  #slice(which.max(var2)) %>%
  bind_rows(df, .)

#  var1 var2 val
#1    A    1  21
#2    A    2  31
#3    A    3  54
#4    B    4  65
#5    B    5  76
#6    A    3  54

In base R, that can be done as :

df1 <- subset(df, var1 == var1_variable)
rbind(df, df1[which.max(df1$var2), ])

From this post we can save the previous work in a temporary variable and then bind rows so that we don't break the chain and don't bind the original dataframe df.

df %>%
  #Previous list of commands
  {
    {. -> temp} %>%
      filter(var1 == var1_variable) %>%
      slice_max(var2, n = 1) %>%
      bind_rows(temp)
  }
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I actually came to the same conclusion, but for better or worse, I have a chain of previous functions applied to the original data and I am trying not to duplicate that work within the bind_rows function. I looked for a way to call the input data frame in bind_rows twice but no luck. – hmhensen Aug 27 '20 at 03:45
  • Something like `df %>% bind_rows(., . %>% slice(3))`. – hmhensen Aug 27 '20 at 03:47
  • I don't completely understand but you can break the chain into two parts? Store all the changes above this in `df1`, use the above code and `bind_rows` to `df1` ? – Ronak Shah Aug 27 '20 at 03:49
  • I could. I was just wondering if there was a way to do it without breaking it up. – hmhensen Aug 27 '20 at 03:50
  • I think the updated answer might help doing this without breaking the chain. – Ronak Shah Aug 27 '20 at 04:02
  • Yes, that works, thanks! Creating a temporary variable did it. – hmhensen Aug 27 '20 at 04:14
0

In base you can use rbind and subset to append the row(s) where var1 == "A" with the highest value of var2 to the original data frame.

rbind(x, subset(x[x$var1 == "A",], var2 == max(var2)))
#   var1 var2 val
#1     A    1  21
#2     A    2  31
#3     A    3  54
#4     B    4  65
#5     B    5  76
#31    A    3  54

Data:

x <- data.frame(var1 = c("A", "A", "A", "B", "B"), 
           var2 = c(1, 2, 3, 4, 5), 
           val = c(21, 31, 54, 65, 76))
GKi
  • 37,245
  • 2
  • 26
  • 48
0

An option with uncount

library(dplyr)
library(tidyr)
df1 %>%
     uncount(replace(rep(1, n()), match(max(val[var1 == 'A']), val), 2)) %>% 
     as_tibble
# A tibble: 6 x 3
#  var1   var2   val
#  <chr> <dbl> <dbl>
#1 A         1    21
#2 A         2    31
#3 A         3    54
#4 A         3    54
#5 B         4    65
#6 B         5    76
akrun
  • 874,273
  • 37
  • 540
  • 662