Duplicate first occurence of string into column of data frame

Question

In a data frame, I am attempting to duplicate the first occurence of a string into the same column, but also into the neighbouring column. More specifically, I want the first occurence of a string in column v1 to be duplicated and inserted above itself and above the same row in column v2, as exemplified in the mock data frame below:

Input:

df_1<-data.frame("v1"=c(rep("a",times=3),rep("aa",times=4)),"v2"=c(c("b","c","d"),c("bb","cc","dd","ee")))
df_1
      v1 v2
    1  a  b
    2  a  c
    3  a  d
    4 aa bb
    5 aa cc
    6 aa dd
    7 aa ee

Expected output:

df_2<-data.frame("v1"=c(rep("a",times=4),rep("aa",times=5)),"v2"=c(c("a","b","c","d"),c("aa","bb","cc","dd","ee")))
df_2
    v1 v2
    1  a  a
    2  a  b
    3  a  c
    4  a  d
    5 aa aa
    6 aa bb
    7 aa cc
    8 aa dd
    9 aa ee

So in this case, the first occurence of "a" and "aa" has been duplicated and inserted into the same data frame above it's first occurence.

I hope my question makes sense.

Best, Rikki

score 2 · Answer 1 · answered Jul 13 '20 at 14:14

2

One dplyr option could be:

df_1 %>%
 group_by(v1) %>%
 uncount((row_number() == 1) + 1) %>%
 mutate(v2 = if_else(row_number() == 1, first(v1), v2))

  v1    v2   
  <chr> <chr>
1 a     a    
2 a     b    
3 a     c    
4 a     d    
5 aa    aa   
6 aa    bb   
7 aa    cc   
8 aa    dd   
9 aa    ee

answered Jul 13 '20 at 14:14

tmfmnk

38,881
4
47
67

Thanks a lot tmfmnk. I haven't familiarized myself with dplyr yet, but I will do my best to get to my head around this solution :) – Rikki Franklin Frederiksen Jul 14 '20 at 13:20

score 2 · Answer 2 · edited Jul 13 '20 at 15:20

2

Here is a base R idea:

 do.call(rbind, lapply(split(df_1, df_1$v1), function(i)
                                 rbind(data.frame(v1 = i$v1[1], v2 = i$v1[1]), i)))
#     v1 v2
#a.1   a  a
#a.2   a  b
#a.3   a  c
#a.4   a  d
#aa.1 aa aa
#aa.4 aa bb
#aa.5 aa cc
#aa.6 aa dd
#aa.7 aa ee

NOTE: You can use rownames() <- NULL to remove the rownames If they bother you.

EDIT Apparently there is a make.row.names arguments in the data.frame-method of rbind as provided in comments by @Jaap:

do.call(rbind, c(lapply(split(df_1, df_1$v1),
                        function(i) rbind(data.frame(v1 = i$v1[1], v2 = i$v1[1]), i)),
                 make.row.names = FALSE)
        )

edited Jul 13 '20 at 15:20

Jaap

81,064
34
182
193

answered Jul 13 '20 at 14:20

Sotos

51,121
6
32
66

2

`do.call(rbind, c(lapply(split(df_1, df_1$v1), function(i) rbind(data.frame(v1 = i$v1[1], v2 = i$v1[1]), i)), make.row.names = FALSE))` – Jaap Jul 13 '20 at 14:31
1

Sotos and Jaap, thank you so much for this compact solution. – Rikki Franklin Frederiksen Jul 14 '20 at 17:49

GKi · Answer 3 · 2020-07-14T05:47:22.290

1

You can use rep to copy the matching rows and then overwrite v2:

i <- !duplicated(df_1$v1)
df_2 <- df_1[rep(seq_len(length(i)), 1+i),]
i <- which(i)
i <- i + seq(0, length.out=length(i))
df_2$v2[i] <- df_2$v1[i]
#df_2[i,] <- df_2$v1[i]   #Alternative
#df_2[i,-1] <- df_2$v1[i] #Alternative
df_2
#    v1 v2
#1    a  a
#1.1  a  b
#2    a  c
#3    a  d
#4   aa aa
#4.1 aa bb
#5   aa cc
#6   aa dd
#7   aa ee

edited Jul 14 '20 at 05:47

answered Jul 13 '20 at 14:14

GKi

37,245
2
26
48

1

GKi, thank you so much for the solution. I have to admit it's quite advanced for me, but I will take it bit by bit :). – Rikki Franklin Frederiksen Jul 14 '20 at 18:48

Matt · Answer 4 · 2020-07-14T17:41:13.000

1

Here's one dplyr solution:

library(dplyr) 

df_1 %>% 
  select(v1) %>% 
  mutate(v2 = v1) %>% 
  unique() %>% 
  rbind(df_1) %>% 
  arrange(v1)

Which gives:

  v1 v2
1   a  a
11  a  b
2   a  c
3   a  d
4  aa aa
41 aa bb
5  aa cc
6  aa dd
7  aa ee

edited Jul 14 '20 at 17:41

answered Jul 13 '20 at 14:16

Matt

7,255
2
12
34

Thanks a lot Matt. Even without familiarity with dplyr, this seems very logical. Just one thing: the punctiation (.) in rbind(.,) calls the result from the unique() function, right? – Rikki Franklin Frederiksen Jul 14 '20 at 15:19
After you asked this, I re-ran the code above without the dot, and it turns out you don't actually need it. It is used to pass the transformed data from the left hand side to the right hand side (you can read more here: https://stackoverflow.com/questions/35272457/what-does-the-dplyr-period-character-reference#:~:text=The%20dot%20is%20used%20within,reference%20single%20columns%20by%20using%20.) – Matt Jul 14 '20 at 16:09
Thanks for clarifying Matt. – Rikki Franklin Frederiksen Jul 14 '20 at 17:31

Duplicate first occurence of string into column of data frame

4 Answers4