1

I want to count occurrences of the three factors for each column of mydata, so I thought of the function table

Some data of mydata:

              A0AUT     A0AYT     A0AZT     A0B2T     A0B3T
100130426 no_change no_change no_change no_change no_change
100133144 no_change no_change      down no_change no_change
100134869 no_change no_change no_change no_change no_change
10357     no_change        up no_change no_change        up
10431     no_change        up no_change no_change no_change
136542    no_change        up no_change no_change no_change
> str(mydata)
'data.frame':   20531 obs. of  518 variables:
 $ A0AUT: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ A0AYT: Factor w/ 3 levels "down","no_change",..: 2 2 2 3 3 3 2 2 2 3 ...
 $ A0AZT: Factor w/ 3 levels "down","no_change",..: 2 1 2 2 2 2 1 2 2 2 ...
 $ A0B2T: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 1 2 2 2 ...
 $ A0B3T: Factor w/ 3 levels "down","no_change",..: 2 2 2 3 2 2 2 2 2 2 ...
 $ A0B5T: Factor w/ 3 levels "down","no_change",..: 2 2 2 3 2 2 2 2 2 2 ...
 $ A0B7T: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 1 2 2 2 ...
 $ A0B8T: Factor w/ 3 levels "down","no_change",..: 2 1 1 2 3 2 2 2 2 2 ...
 $ A0BAT: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ A0BCT: Factor w/ 3 levels "down","no_change",..: 2 2 2 2 3 2 2 2 2 2 ...

Now I do:

occurences <- apply(mydata, 1, table)
> occurences[[1]] # 100130426

no_change        up 
      508        10 
> occurences[[2]] # 100133144

     down no_change        up 
       45       446        27 

But I want them as a matrix (or at least I think it is easier to deal with) so I made this:

  freq <- sapply(occurences, function(x){
    c(x, rep(0, 3 - length(x)))
  })

> freq[,1:5]
          100130426 100133144 100134869 10357 10431
no_change       508        45        14     3     3
up               10       446       411   330   268
                  0        27        93   185   247

However as you can see the number of no_change for 100133144 went to the up row!

My expected output would be:

> freq[,1:5]
              100130426 100133144 100134869 10357 10431
    up               10        45        14     3     3
    no_change       508       446       411   330   268
    down              0        27        93   185   247

How can I make it so that each value is well placed? As you can see each table may be just one to three elements, so doing:

freq <- matrix(unlist(occurences), nrow=3)

results on error, because not multiple of 3.

I might have taken a bad approach to count the frequencies of mydata by column. I would prefer to have an approach with just base R, without using any library

llrs
  • 3,308
  • 35
  • 68
  • [Please make your question reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – Jaap Feb 13 '16 at 14:32
  • `library(reshape2); dcast(melt(mydf, id="id"), value ~ id)` – Jaap Feb 13 '16 at 14:47
  • Can you show the `str(mydata)`. – akrun Feb 13 '16 at 15:23
  • Your expected output may be based on the original dataset. I meant based on the part of the data you showed. – akrun Feb 13 '16 at 15:55
  • @akrun I have included it, is just reordering the first column, or anyone which don't have all the three factors – llrs Feb 13 '16 at 15:56

2 Answers2

3

We can do with table. Convert the 'data.frame' to 'matrix' and reshape from 'wide' to 'long' (using melt from reshape2), and call table on the concerned columns to get the frequency count.

library(reshape2)
table(melt(as.matrix(mydata))[c(3,1)])
#              Var1
#value       10357 10431 136542 100130426 100133144 100134869
#  down          0     0      0         0         1         0
#  no_change     3     4      4         5         4         5
#  up            2     1      1         0         0         0

Or using only base R, we can just unlist the data to get a vector, replicate the 'row names' (using col) and then call the table

table(unlist(mydata), row.names(mydata)[col(mydata)])
#             Var1
#value       10357 10431 136542 100130426 100133144 100134869
#  down          0     0      0         0         1         0
#  no_change     3     4      4         5         4         5
#  up            2     1      1         0         0         0

Another option is dplyr/tidyr

library(dplyr)
library(tidyr)
add_rownames(mydata) %>%
    gather(Var, Val,-rowname) %>% 
    group_by(rowname, Val) %>%
    summarise(n=n()) %>% 
    spread(rowname, n, fill=0)

Update

If the dataset columns are factor, we can convert it to character class before doing the unlist

mydata[] <- lapply(mydata, as.character)

Update2

If this is based on each row

library(qdapTools)
t(mtabulate(as.data.frame(t(mydata))))
#          100130426 100133144 100134869 10357 10431 136542
#no_change         5         4         5     3     4      4
#down              0         1         0     0     0      0
#up                0         0         0     2     1      1

Or using only base R, we create a vector of unique elements in the dataset ('nm1' - here it is already known, but if it is not, nm1 <- unique(unlist(lapply(mydata, as.character)))), then loop over the rows using apply with MARGIN=1, use tabulate after converting the row vector to factor with levels specified as 'nm1'. In tabulate, we can also specify the length of return vector i.e. length of 'nm1'. The output will be a matrix. We can assign the row names (row.names<-) as 'nm1'.

nm1 <- c('up', 'no_change', 'down')
`row.names<-`(apply(mydata, 1, function(x)
     tabulate(factor(x, levels=nm1),length(nm1))), nm1)
#          100130426 100133144 100134869 10357 10431 136542
#up                0         0         0     2     1      1
#no_change         5         4         5     3     4      4
#down              0         1         0     0     0      0

data

mydata <- structure(list(A0AUT = c("no_change", "no_change", 
"no_change", 
"no_change", "no_change", "no_change"), A0AYT = c("no_change", 
"no_change", "no_change", "up", "up", "up"), A0AZT = c("no_change", 
"down", "no_change", "no_change", "no_change", "no_change"), 
    A0B2T = c("no_change", "no_change", "no_change", "no_change", 
    "no_change", "no_change"), A0B3T = c("no_change", "no_change", 
    "no_change", "up", "no_change", "no_change")),
 .Names = c("A0AUT", 
"A0AYT", "A0AZT", "A0B2T", "A0B3T"), class = "data.frame",
 row.names = c("100130426", 
"100133144", "100134869", "10357", "10431", "136542"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I would prefer a solution without calling any other library, but good solution anyway – llrs Feb 13 '16 at 15:09
  • `Error in col(occurences) : a matrix-like object is required as argument to 'col'`, not working :( but thanks, (I think the data I posted is the same as I used) but I don't really understand the second argument. – llrs Feb 13 '16 at 15:14
  • @Llopis I posted the data used. It is a `data.frame`. The second argument is actually replicating the row names of mydata to make the lengths equal. If your dataset is `matrix` or `data.frame`, `col(mydata)` gives you the column index. – akrun Feb 13 '16 at 15:15
  • Oh, sorry I first tried with the `occurence` object! it works with the data.frame of course – llrs Feb 13 '16 at 15:20
  • @Llopis BTW, if your object is a `list`, you would get the same error. But, based on the data showed, I assumed that it either data.frame or matrix. For example, `lst <- rep(list(1:3), 4); col(lst) #Error in col(lst) : a matrix-like object is required as argument to 'col'` – akrun Feb 13 '16 at 15:21
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/103379/discussion-between-llopis-and-akrun). – llrs Feb 13 '16 at 15:23
2

Promoting my comment to an answer:

library(reshape2)
dcast(melt(mydf, id="id"), value + variable ~ id, length)

This supposes that the numbers are an id-variable. If they are stored as rownumbers:

dcast(melt(as.matrix(mydf)), value ~ Var1)

Both give:

      value 10357 10431 136542 100130426 100133144 100134869
1      down     0     0      0         0         1         0
2 no_change     3     4      4         5         4         5
3        up     2     1      1         0         0         0
Jaap
  • 81,064
  • 34
  • 182
  • 193