Adding duplicated rows to a dataframe

Question

This may have been already solved somewhere else, but I can't find any specific link, so I'll be happy to see a "duplicated" tag...

I have a dataframe with rows that go like the following one:

  x y z lon lat count
1 A B C   0   0     3
2 B D Q   1   2     2

Now, to plot data with ggmap (I'm new and still learning about the grammar of graphics), specifically using the stat_bin2d I think that I should have to transform my above data in the following way:

  x y z lon lat 
1 A B C   0   0
2 A B C   0   0
3 A B C   0   0
4 B D Q   1   2
5 B D Q   1   2

Questions:

1) Is my assumption correct?

2) How can I reach my goal?

I've tried several ways to use rbind without a for loop, but I didn't solve my problem... The only way that I can think in my little knowledge of R language is something on the line of

my_df <- structure(list(x = structure(1:2, .Label = c("A", "B"), class = "factor"), 
                        y = structure(1:2, .Label = c("B", "D"), class = "factor"), 
                        z = structure(1:2, .Label = c("C", "Q"), class = "factor"), 
                        lon = c(0, 1), lat = c(0, 2), count = c(3, 2)), 
                   .Names = c("x", "y", "z", "lon", "lat", "count"), 
row.names = 1:2, class = "data.frame")

for (i in 1:nrow(my_df)){
    for (j in 1:(my_df$count[i]-1)){
        my_df <- rbind(my_df, my_df[i,])}}
row.names(my_df) <- 1:nrow(my_df)
my_df <- my_df[,1:5]

Result is:

  x y z lon lat
1 A B C   0   0
2 B D Q   1   2
3 A B C   0   0
4 A B C   0   0
5 B D Q   1   2

It works, but I'd like to learn a better way to reach my goal.

score 2 · Answer 1 · edited May 23 '17 at 11:45

2

You can do :

my_df[rep(seq_len(nrow(my_df)), times = my_df$count), ]

See this post

edited May 23 '17 at 11:45

Community

1
1

answered Jul 31 '15 at 09:27

Victorp

13,636
2
51
55

I like the beauty and simplicity of this solution, but I don't really get all of how it works. It "subsets" `my_df` (actually, expanding it) including all columns (and this can be improved to drop the last one with the values), but how does it know how many times to repeat each row of `my_df` with just a single call to `my_df$count`? – MaZe Jul 31 '15 at 10:16
I got it now. the single call throws out a vector of values, that's how it replicates the rows for the exact number of times. thanks, this is beautiful. – MaZe Jul 31 '15 at 10:39

akrun · Accepted Answer · 2015-07-31T09:36:11.330

1

We can make use of a convenient function expandRows from splitstackshape to replicate the rows by the 'count' columns.

library(splitstackshape)
res <- expandRows(my_df, 'count')
row.names(res) <- NULL
res
#  x y z lon lat
#1 A B C   0   0
#2 A B C   0   0
#3 A B C   0   0
#4 B D Q   1   2
#5 B D Q   1   2

edited Jul 31 '15 at 09:36

answered Jul 31 '15 at 09:29

akrun

874,273
37
540
662

thanks, it seems anyway that what the function `expandRows` does is actually the same of the above simplified solution, in a more elaborate and "professional" way: see https://github.com/mrdwab/splitstackshape/blob/master/R/expandRows.R – MaZe Jul 31 '15 at 10:12
@MaZe It has some additional functionalities as well as some check conditionalities to take care of cases where `count=0`. And it has argument to either return the count column or not. As I mentioned, it is a convenient wrapper. – akrun Jul 31 '15 at 10:14
yes, it's indeed a nice function. I think I will go for this solution, adding the function to my code without installing the whole package (for now, at least)... thanks! – MaZe Jul 31 '15 at 10:40

Adding duplicated rows to a dataframe

2 Answers2