1

This may have been already solved somewhere else, but I can't find any specific link, so I'll be happy to see a "duplicated" tag...

I have a dataframe with rows that go like the following one:

  x y z lon lat count
1 A B C   0   0     3
2 B D Q   1   2     2

Now, to plot data with ggmap (I'm new and still learning about the grammar of graphics), specifically using the stat_bin2d I think that I should have to transform my above data in the following way:

  x y z lon lat 
1 A B C   0   0
2 A B C   0   0
3 A B C   0   0
4 B D Q   1   2
5 B D Q   1   2

Questions:

1) Is my assumption correct?

2) How can I reach my goal?

I've tried several ways to use rbind without a for loop, but I didn't solve my problem... The only way that I can think in my little knowledge of R language is something on the line of

my_df <- structure(list(x = structure(1:2, .Label = c("A", "B"), class = "factor"), 
                        y = structure(1:2, .Label = c("B", "D"), class = "factor"), 
                        z = structure(1:2, .Label = c("C", "Q"), class = "factor"), 
                        lon = c(0, 1), lat = c(0, 2), count = c(3, 2)), 
                   .Names = c("x", "y", "z", "lon", "lat", "count"), 
row.names = 1:2, class = "data.frame")

for (i in 1:nrow(my_df)){
    for (j in 1:(my_df$count[i]-1)){
        my_df <- rbind(my_df, my_df[i,])}}
row.names(my_df) <- 1:nrow(my_df)
my_df <- my_df[,1:5]

Result is:

  x y z lon lat
1 A B C   0   0
2 B D Q   1   2
3 A B C   0   0
4 A B C   0   0
5 B D Q   1   2

It works, but I'd like to learn a better way to reach my goal.

Jaap
  • 81,064
  • 34
  • 182
  • 193
MaZe
  • 239
  • 1
  • 4
  • 13

2 Answers2

2

You can do :

my_df[rep(seq_len(nrow(my_df)), times = my_df$count), ]

See this post

Community
  • 1
  • 1
Victorp
  • 13,636
  • 2
  • 51
  • 55
  • I like the beauty and simplicity of this solution, but I don't really get all of how it works. It "subsets" `my_df` (actually, expanding it) including all columns (and this can be improved to drop the last one with the values), but how does it know how many times to repeat each row of `my_df` with just a single call to `my_df$count`? – MaZe Jul 31 '15 at 10:16
  • I got it now. the single call throws out a vector of values, that's how it replicates the rows for the exact number of times. thanks, this is beautiful. – MaZe Jul 31 '15 at 10:39
1

We can make use of a convenient function expandRows from splitstackshape to replicate the rows by the 'count' columns.

library(splitstackshape)
res <- expandRows(my_df, 'count')
row.names(res) <- NULL
res
#  x y z lon lat
#1 A B C   0   0
#2 A B C   0   0
#3 A B C   0   0
#4 B D Q   1   2
#5 B D Q   1   2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thanks, it seems anyway that what the function `expandRows` does is actually the same of the above simplified solution, in a more elaborate and "professional" way: see https://github.com/mrdwab/splitstackshape/blob/master/R/expandRows.R – MaZe Jul 31 '15 at 10:12
  • @MaZe It has some additional functionalities as well as some check conditionalities to take care of cases where `count=0`. And it has argument to either return the count column or not. As I mentioned, it is a convenient wrapper. – akrun Jul 31 '15 at 10:14
  • yes, it's indeed a nice function. I think I will go for this solution, adding the function to my code without installing the whole package (for now, at least)... thanks! – MaZe Jul 31 '15 at 10:40