2

I have a large data frame (n=553), with the Allele column having 3 levels (0,1,2), and the Range Column with 2 levels(Normal and High).

dd <- data.frame(
  Allele = c(0, 0, 1, 1, 2, 2), 
  Range = c("High", "Normal", "High", "Normal", "High", "Normal"), 
  Frequency = c(1L, 2L, 9L, 7L, 28L, 17L)
)

I want to be able to take the entire data frame and turn it into a 2x3 table where the output would have the Allele columns and the Range as rows

The problem I'm getting is that I want to be able to add all the frequency counts that match the row ie if Allele = 0 and Range = Normal, then add the frequencies together for all matching rows.

It feels like there is a simple solution but I can't seem to think of one. Any help is greatly appreciated

MrFlick
  • 195,160
  • 17
  • 277
  • 295
statsbinf
  • 23
  • 2
  • Welcome to SO! Can you show the expected output? (+1 for the reproducible example) – markus Aug 12 '19 at 21:01
  • 1
    The first part of your description seems like a simple reshaping from long to wide data, but what do you mean about adding frequencies? Are you expecting that there would be multiple observations of a given combination of allele and range? If so, you'll need to add some examples to your sample data – camille Aug 12 '19 at 21:03
  • 3
    Maybe `xtabs(Frequency ~ Range + Allele, dd)`. – Rui Barradas Aug 12 '19 at 21:04
  • @camille So for example the first row of the dataframe is 0 Normal 1, say there is another row somewhere down the dataframe with a row 0 Normal 10. The output should add the 1+10 together as well any other row which has 0 and Normal – statsbinf Aug 12 '19 at 21:29
  • @markus I was trying to produce the output but the blockcode wasn't work, i'll try again here ` 0 1 2 Normal 36 178 349 High 18 46 91 ` – statsbinf Aug 12 '19 at 21:30
  • 2
    `reshape2::dcast(data = dd, formula = Allele ~ Range, value.var = "Frequency", fun.aggregate = sum)` – d.b Aug 12 '19 at 21:30
  • @d.b Yes that works perfect! Thank you so much!!! – statsbinf Aug 12 '19 at 21:37
  • @RuiBarradas This is exactly what i was looking for!! Thank you so much! – statsbinf Aug 12 '19 at 21:38
  • It's just reshaping problem it seems. `tidyr::spread(dd, Allele, Frequency)` – Ronak Shah Aug 13 '19 at 03:41

1 Answers1

0

1

m = sapply(unique(dd$Allele), function(x)
    sapply(unique(dd$Range), function(y){
        sum(dd$Frequency[dd$Allele == x & dd$Range == y])
    }))
dimnames(m) = list(Range = unique(dd$Range), Allele = unique(dd$Allele))
m
#        Allele
#Range    0 1  2
#  High   1 9 28
#  Normal 2 7 17

2

with(dd, tapply(Frequency, list(Range, Allele), sum))
#       0 1  2
#High   1 9 28
#Normal 2 7 17

3

library(reshape2)
dcast(dd, Range ~ Allele, value.var = "Frequency", fun.aggregate = sum)
#   Range 0 1  2
#1   High 1 9 28
#2 Normal 2 7 17
d.b
  • 32,245
  • 6
  • 36
  • 77