Counting and reorganizing data in a R matrix

Question

I have the following matrix, with the results of operations A,B,C,D,E

  Name  result  freq
  A     ok      3
  A     nok     4
  B     ok      5
  B     nok     6
  C     ok      7
  D     nok     8
  E     ok      8
  E     nok     9

I need to create a new table having the following format:

  Name  freok   Frenok
  A       3      4
  B       5      6
  C       7      0
  D       0      8
  E       8      9

counting for each operation the good and bad results, and if the ok or notok for an operation is missing in the first matrix, then i need to put a Zero in the processed matrix.

I need to process a matrix of about 16 millions of rows

What's the fastest solution?

In your example data there is no counting. If in your real data there is an actual counting, then the `tydir` answer will fail — David Arenburg, Nov 02 '16 at 11:53

score 4 · Accepted Answer · answered Nov 02 '16 at 10:11

4

We can do this with xtabs from base R

xtabs(freq~Name+result, df1)
#   result
#Name nok ok
#   A   4  3
#   B   6  5
#   C   0  7
#   D   8  0
#   E   9  8

A fast and efficient option would be dcast from data.table

library(data.table)
dcast(as.data.table(df1), Name~ paste0("fre", result), value.var="freq", sum)

answered Nov 02 '16 at 10:11

akrun

874,273
37
540
662

Your solution has a better performance than using the "spread" function. in addition it return numeric values, while spread was returning characters, thanks a lot – Alex Fort Nov 02 '16 at 11:37

cderv · Answer 2 · 2016-11-02T11:53:29.870

3

Using tidyr (part of the tidyverse in R and working with dplyr)

# your data example
df <- structure(
  list(
    Name = c("A", "A", "B", "B", "C", "D", "E", "E"),
    result = c("ok", "nok", "ok", "nok", "ok", "nok", "ok", "nok"),
    freq = c(3L, 4L, 5L, 6L, 7L, 8L, 8L, 9L)
  ),
  class = "data.frame",
  row.names = c(NA,-8L),
  .Names = c("Name", "result", "freq")
)

df
#>   Name result freq
#> 1    A     ok    3
#> 2    A    nok    4
#> 3    B     ok    5
#> 4    B    nok    6
#> 5    C     ok    7
#> 6    D    nok    8
#> 7    E     ok    8
#> 8    E    nok    9

res <- df %>% tidyr::spread(result, freq, fill = 0)

res
#>   Name nok ok
#> 1    A   4  3
#> 2    B   6  5
#> 3    C   0  7
#> 4    D   8  0
#> 5    E   9  8
str(res)
#> 'data.frame':    5 obs. of  3 variables:
#>  $ Name: chr  "A" "B" "C" "D" ...
#>  $ nok : num  4 6 0 8 9
#>  $ ok  : num  3 5 7 0 8

edited Nov 02 '16 at 11:53

answered Nov 02 '16 at 10:17

cderv

6,272
1
21
31

It worked, even if i had to transpose the results, the SPREAD function returned me the A,B,C,D,E as coulmns and ok and nok as rows, just a minor detail – Alex Fort Nov 02 '16 at 11:01
In addition unfortunately it returns the results as colnames values and all in string format – Alex Fort Nov 02 '16 at 11:37
I do not understand. `nok` and `ok` are not strings they are numeric. And you do not need to transpose your df. I clarified the answered to show more, but it works as expected – cderv Nov 02 '16 at 11:51

Counting and reorganizing data in a R matrix

2 Answers2