Duplicating rows in dataframe based on column value

Question

I am trying to duplicate rows based on the value of a column. My dataframe (df) currently looks like:

Species name	Visits
Apis m	4
Bombus l	7

And so on (there are 34 more columns which all need to be repeated) I want it to look like:

Species name
Apis m
Apis m
Apis m
Apis m
Bombus l
Bombus l
Bombus l
Bombus l
Bombus l
Bombus l
Bombus l

This a fairly large dataset of 1767 observations already, there are 190 'Species Name' and each one has been visited several hundred times.

I'm very new to R (and coding!) so everything is very 'trial and error'. I found a solution on Stack Overflow using "splitstackshape" but am getting the error

"Error in .subset2(x, i, exact = exact) : recursive indexing failed at level 2".

This is my code:

expandRows(df, df$Visits, 
           count.is.col = TRUE, drop = TRUE)

There are questions for other instances of this error but note related to the 'expand rows' function. The column is stored as an integer and I've removed any null values from the 'Visits' column.

Any pointers as to what my issue might be or other ideas of how to do this would be much appreciated.

Danielle

Edit: Reprex below, I'm not sure what 'could not find function' relates to as it appeared to run the code without the Reprex? Also, not in here that it includes the actual column names and df, I simplified in the example above.

expandRows(BombusL, BombusL$No.of.Interaction.Records, count.is.col = TRUE, 
    drop = TRUE)
#> Error in expandRows(BombusL, BombusL$No.of.Interaction.Records, count.is.col = TRUE, : could not find function "expandRows"

please provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) using [`dput`](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/dput) or a [`reprex`](https://reprex.tidyverse.org/) — EJJ, Apr 25 '21 at 14:38

score 1 · Answer 1 · answered Apr 25 '21 at 14:54

1

You can try uncount from the tidyr/tidyverse package

library(tidyr)

data <- data.frame(Species = c("Apis m","Nimbus"),Visits = c(4,7))
data %>% 
  uncount(Visits)
#>     Species
#> 1    Apis m
#> 1.1  Apis m
#> 1.2  Apis m
#> 1.3  Apis m
#> 2    Nimbus
#> 2.1  Nimbus
#> 2.2  Nimbus
#> 2.3  Nimbus
#> 2.4  Nimbus
#> 2.5  Nimbus
#> 2.6  Nimbus

^{Created on 2021-04-25 by the reprex package (v2.0.0)}

answered Apr 25 '21 at 14:54

Ran K

162
1
5

Thanks Ran K, I don't think this will work as I have 190 odd different species and I would have to list them all out using this I think? – Danielle Edwards Apr 26 '21 at 10:01
You dont need to list all the species, I only listed them in order to populate a database similar to yours. Does `df %>% uncount(Visits)` works? – Ran K Apr 26 '21 at 10:41

TarJae · Accepted Answer · 2021-04-25T15:22:42.800

1

Update (as uncount is already mentioned):

With your code:

df.expanded <- df[rep(row.names(df), df$Visits), 1:2]

Or: You could use slice with seq_len(n())

library(dplyr)
df %>%  
  slice(rep(seq_len(n()), Visits)) %>% 
  select(-Visits)

Output:

   Species.name
   <chr>       
 1 Apis m      
 2 Apis m      
 3 Apis m      
 4 Apis m      
 5 Bombus l    
 6 Bombus l    
 7 Bombus l    
 8 Bombus l    
 9 Bombus l    
10 Bombus l    
11 Bombus l

edited Apr 25 '21 at 15:22

answered Apr 25 '21 at 15:11

TarJae

72,363
6
19
66

Thanks TarJae, Neither of these worked unfortunately, I got the error message 'invalid 'times' arguement' – Danielle Edwards Apr 26 '21 at 10:06
Scratch that, the second solution works! Turns out that I hadn't successfully removed all of the NA values, thanks! – Danielle Edwards Apr 26 '21 at 10:55

Duplicating rows in dataframe based on column value

2 Answers2