0

I basically want do the opposite of ddply(df, columns.to.preserve, numcolwise(FUNCTION).

Suppose I have

d <- data.frame(
  count=c(2,1,3),
  summed.value=c(50,20,30),
  averaged.value=c(35,80,20)
)

      count summed.value averaged.value
1     2           50             35
2     1           20             80
3     3           30             20

I want to do a row expansion of this data.frame based on the count column while specifying what kind of operation I want to apply to the other columns. Here is the kind of result I'm looking for:

> d2
  count summed.value averaged.value
1     1           25             35
2     1           25             35
3     1           20             80
4     1           10             20
5     1           10             20
6     1           10             20

Any there built in functions within dplyr or other packages that does this kind of operation?

Edit: This is different from the De-aggregate / reverse-summarise / expand a dataset in R question because I want to go further and actually apply different functions to columns within the table I wish to expand. There are also more useful and answers on this post.

Community
  • 1
  • 1
Warner
  • 1,353
  • 9
  • 23
  • 4
    Some places to start on expanding the data.frame [here](http://stackoverflow.com/questions/38208529/de-aggregate-reverse-summarise-expand-a-dataset-in-r). Once that is done, just divide `summed.value` and `count` by `count`. – aosmith Jul 28 '16 at 14:57

4 Answers4

2

Use dplyr and tidyr, you can do a rowwise transformation for the summed.value which produces a list for each cell and then unnest the column should give you what you need:

library(dplyr); library(tidyr)
d %>% rowwise() %>% summarise(summed.value = list(rep(summed.value/count, count)), 
                              averaged.value = averaged.value, count = 1) %>% unnest()

# Source: local data frame [6 x 3]

#   averaged.value count summed.value
#            <dbl> <dbl>        <dbl>
# 1             35     1           25
# 2             35     1           25
# 3             80     1           20
# 4             20     1           10
# 5             20     1           10
# 6             20     1           10

Another way is to use data.table, where you can specify the row number as group variable, and the data table will automatically expand it:

library(data.table)
setDT(d)
d[, .(summed.value = rep(summed.value/count, count), averaged.value, count = 1), .(1:nrow(d))]
 [, nrow := NULL][]

#   summed.value averaged.value count
#1:           25             35     1
#2:           25             35     1
#3:           20             80     1
#4:           10             20     1
#5:           10             20     1
#6:           10             20     1
Psidom
  • 209,562
  • 33
  • 339
  • 356
2

There is a function untable in package reshape for getting the inverse of a table. Then divide the variables that need dividing by count via mutate_at (or mutate_each). mutate_at was introduced in dplyr_0.5.0.

First the untable:

library(reshape)
untable(d, num = d$count)

    count summed.value averaged.value
1       2           50             35
1.1     2           50             35
2       1           20             80
3       3           30             20
3.1     3           30             20
3.2     3           30             20

Then the mutate_at for dividing summed.value and count by count:

library(dplyr)

untable(d, num = d$count) %>%
    mutate_at(vars(summed.value, count), funs(./count))

  count summed.value averaged.value
1     1           25             35
2     1           25             35
3     1           20             80
4     1           10             20
5     1           10             20
6     1           10             20
aosmith
  • 34,856
  • 9
  • 84
  • 118
  • Where does `mutate_at` come from? It doesn't appear to be in `plyr` or `dplyr`. Also, do you happen to know why `untable` isn't in `reshape2`? – Warner Jul 28 '16 at 17:05
  • `mutate_at` is in the current version of dplyr. I don't know why `untable` wasn't brought into reshape2. – aosmith Jul 28 '16 at 18:41
1

Here's a both simple and fully vecotrized base R approach

transform(d[rep(1:nrow(d), d$count), ], 
          count = 1, 
          summed.value = summed.value/count)
#     count summed.value averaged.value
# 1       1           25             35
# 1.1     1           25             35
# 2       1           20             80
# 3       1           10             20
# 3.1     1           10             20
# 3.2     1           10             20

Or similarly, using data.table

library(data.table)
res <- setDT(d)[rep(1:.N, count)][, `:=`(count = 1, summed.value = summed.value / count)]
res
#    count summed.value averaged.value
# 1:     1           25             35
# 2:     1           25             35
# 3:     1           20             80
# 4:     1           10             20
# 5:     1           10             20
# 6:     1           10             20 
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
0

A base R solution: It tries to replicate each row by the value of the count column and then divide count and summed.value columns by count.

mytext <- 'count,summed.value,averaged.value
2,50,35
1,20,80
3,30,20'

mydf <-  read.table(text=mytext,header=T,sep = ",")

mydf <- do.call(rbind,apply(mydf, 1, function(x) {
  tempdf <- t(replicate(x[1],x,simplify = T))
  tempdf[,1] <- tempdf[,1]/x[1]
  tempdf[,2] <- tempdf[,2]/x[1]
  return(data.frame(tempdf))
}))

count summed.value averaged.value
     1           25             35
     1           25             35
     1           20             80
     1           10             20
     1           10             20
     1           10             20
Abdou
  • 12,931
  • 4
  • 39
  • 42