3

In a data frame, I am attempting to duplicate the first occurence of a string into the same column, but also into the neighbouring column. More specifically, I want the first occurence of a string in column v1 to be duplicated and inserted above itself and above the same row in column v2, as exemplified in the mock data frame below:

Input:

df_1<-data.frame("v1"=c(rep("a",times=3),rep("aa",times=4)),"v2"=c(c("b","c","d"),c("bb","cc","dd","ee")))
df_1
      v1 v2
    1  a  b
    2  a  c
    3  a  d
    4 aa bb
    5 aa cc
    6 aa dd
    7 aa ee

Expected output:

df_2<-data.frame("v1"=c(rep("a",times=4),rep("aa",times=5)),"v2"=c(c("a","b","c","d"),c("aa","bb","cc","dd","ee")))
df_2
    v1 v2
    1  a  a
    2  a  b
    3  a  c
    4  a  d
    5 aa aa
    6 aa bb
    7 aa cc
    8 aa dd
    9 aa ee

So in this case, the first occurence of "a" and "aa" has been duplicated and inserted into the same data frame above it's first occurence.

I hope my question makes sense.

Best, Rikki

4 Answers4

2

One dplyr option could be:

df_1 %>%
 group_by(v1) %>%
 uncount((row_number() == 1) + 1) %>%
 mutate(v2 = if_else(row_number() == 1, first(v1), v2))

  v1    v2   
  <chr> <chr>
1 a     a    
2 a     b    
3 a     c    
4 a     d    
5 aa    aa   
6 aa    bb   
7 aa    cc   
8 aa    dd   
9 aa    ee   
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

Here is a base R idea:

 do.call(rbind, lapply(split(df_1, df_1$v1), function(i)
                                 rbind(data.frame(v1 = i$v1[1], v2 = i$v1[1]), i)))
#     v1 v2
#a.1   a  a
#a.2   a  b
#a.3   a  c
#a.4   a  d
#aa.1 aa aa
#aa.4 aa bb
#aa.5 aa cc
#aa.6 aa dd
#aa.7 aa ee

NOTE: You can use rownames() <- NULL to remove the rownames If they bother you.

EDIT Apparently there is a make.row.names arguments in the data.frame-method of rbind as provided in comments by @Jaap:

do.call(rbind, c(lapply(split(df_1, df_1$v1),
                        function(i) rbind(data.frame(v1 = i$v1[1], v2 = i$v1[1]), i)),
                 make.row.names = FALSE)
        )
Jaap
  • 81,064
  • 34
  • 182
  • 193
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

You can use rep to copy the matching rows and then overwrite v2:

i <- !duplicated(df_1$v1)
df_2 <- df_1[rep(seq_len(length(i)), 1+i),]
i <- which(i)
i <- i + seq(0, length.out=length(i))
df_2$v2[i] <- df_2$v1[i]
#df_2[i,] <- df_2$v1[i]   #Alternative
#df_2[i,-1] <- df_2$v1[i] #Alternative
df_2
#    v1 v2
#1    a  a
#1.1  a  b
#2    a  c
#3    a  d
#4   aa aa
#4.1 aa bb
#5   aa cc
#6   aa dd
#7   aa ee
GKi
  • 37,245
  • 2
  • 26
  • 48
1

Here's one dplyr solution:

library(dplyr) 

df_1 %>% 
  select(v1) %>% 
  mutate(v2 = v1) %>% 
  unique() %>% 
  rbind(df_1) %>% 
  arrange(v1)

Which gives:

  v1 v2
1   a  a
11  a  b
2   a  c
3   a  d
4  aa aa
41 aa bb
5  aa cc
6  aa dd
7  aa ee
Matt
  • 7,255
  • 2
  • 12
  • 34
  • Thanks a lot Matt. Even without familiarity with dplyr, this seems very logical. Just one thing: the punctiation (.) in rbind(.,) calls the result from the unique() function, right? – Rikki Franklin Frederiksen Jul 14 '20 at 15:19
  • After you asked this, I re-ran the code above without the dot, and it turns out you don't actually need it. It is used to pass the transformed data from the left hand side to the right hand side (you can read more here: https://stackoverflow.com/questions/35272457/what-does-the-dplyr-period-character-reference#:~:text=The%20dot%20is%20used%20within,reference%20single%20columns%20by%20using%20.) – Matt Jul 14 '20 at 16:09
  • Thanks for clarifying Matt. – Rikki Franklin Frederiksen Jul 14 '20 at 17:31