1

Embarrassingly basic question, but if you don't know.. I need to reshape a data.frame of count summarised data into what it would've looked like before being summarised. This is essentially the reverse of {plyr} count() e.g.

> (d = data.frame(value=c(1,1,1,2,3,3), cat=c('A','A','A','A','B','B')))
  value cat
1     1   A
2     1   A
3     1   A
4     2   A
5     3   B
6     3   B
> (summry = plyr::count(d))
  value cat freq
1     1   A    3
2     2   A    1
3     3   B    2

If you start with summry what is the quickest way back to d? Unless I'm mistaken (very possible), {Reshape2} doesn't do this..

geotheory
  • 22,624
  • 29
  • 119
  • 196

2 Answers2

2

Just use rep:

summry[rep(rownames(summry), summry$freq), c("value", "cat")]
#     value cat
# 1       1   A
# 1.1     1   A
# 1.2     1   A
# 2       2   A
# 3       3   B
# 3.1     3   B

A variation of this approach can be found in expandRows from my "SOfun" package. If you had that loaded, you would be able to simply do:

expandRows(summry, "freq")
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

There is a good table to dataframe function on the R cookbook website that you can modify slightly. The only modifications were changing 'Freq' -> 'freq' (to be consistent with plyr::count) and making sure the rownames were reset as increasing integers.

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".") {
  # Take each row in the source data frame table and replicate it
  # using the Freq value
  DF <- sapply(1:nrow(x), 
               function(i) x[rep(i, each = x$freq[i]), ],
               simplify = FALSE)

  # Take the above list and rbind it to create a single DF
  # Also subset the result to eliminate the Freq column
  DF <- subset(do.call("rbind", DF), select = -freq)

  # Now apply type.convert to the character coerced factor columns  
  # to facilitate data type selection for each column 
  for (i in 1:ncol(DF)) {
    DF[[i]] <- type.convert(as.character(DF[[i]]),
                            na.strings = na.strings,
                            as.is = as.is, dec = dec)
  }
  row.names(DF) <- seq(nrow(DF))
  DF
}

expand.dft(summry)

  value cat
1     1   A
2     1   A
3     1   A
4     2   A
5     3   B
6     3   B
cdeterman
  • 19,630
  • 7
  • 76
  • 100