1

I have the following matrix, with the results of operations A,B,C,D,E

  Name  result  freq
  A     ok      3
  A     nok     4
  B     ok      5
  B     nok     6
  C     ok      7
  D     nok     8
  E     ok      8
  E     nok     9

I need to create a new table having the following format:

  Name  freok   Frenok
  A       3      4
  B       5      6
  C       7      0
  D       0      8
  E       8      9 

counting for each operation the good and bad results, and if the ok or notok for an operation is missing in the first matrix, then i need to put a Zero in the processed matrix.

I need to process a matrix of about 16 millions of rows

What's the fastest solution?

Alex Fort
  • 93
  • 6
  • In your example data there is no counting. If in your real data there is an actual counting, then the `tydir` answer will fail – David Arenburg Nov 02 '16 at 11:53

2 Answers2

4

We can do this with xtabs from base R

xtabs(freq~Name+result, df1)
#   result
#Name nok ok
#   A   4  3
#   B   6  5
#   C   0  7
#   D   8  0
#   E   9  8

A fast and efficient option would be dcast from data.table

library(data.table)
dcast(as.data.table(df1), Name~ paste0("fre", result), value.var="freq", sum)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Your solution has a better performance than using the "spread" function. in addition it return numeric values, while spread was returning characters, thanks a lot – Alex Fort Nov 02 '16 at 11:37
3

Using tidyr (part of the tidyverse in R and working with dplyr)

# your data example
df <- structure(
  list(
    Name = c("A", "A", "B", "B", "C", "D", "E", "E"),
    result = c("ok", "nok", "ok", "nok", "ok", "nok", "ok", "nok"),
    freq = c(3L, 4L, 5L, 6L, 7L, 8L, 8L, 9L)
  ),
  class = "data.frame",
  row.names = c(NA,-8L),
  .Names = c("Name", "result", "freq")
)

df
#>   Name result freq
#> 1    A     ok    3
#> 2    A    nok    4
#> 3    B     ok    5
#> 4    B    nok    6
#> 5    C     ok    7
#> 6    D    nok    8
#> 7    E     ok    8
#> 8    E    nok    9

res <- df %>% tidyr::spread(result, freq, fill = 0)

res
#>   Name nok ok
#> 1    A   4  3
#> 2    B   6  5
#> 3    C   0  7
#> 4    D   8  0
#> 5    E   9  8
str(res)
#> 'data.frame':    5 obs. of  3 variables:
#>  $ Name: chr  "A" "B" "C" "D" ...
#>  $ nok : num  4 6 0 8 9
#>  $ ok  : num  3 5 7 0 8
cderv
  • 6,272
  • 1
  • 21
  • 31
  • It worked, even if i had to transpose the results, the SPREAD function returned me the A,B,C,D,E as coulmns and ok and nok as rows, just a minor detail – Alex Fort Nov 02 '16 at 11:01
  • In addition unfortunately it returns the results as colnames values and all in string format – Alex Fort Nov 02 '16 at 11:37
  • I do not understand. `nok` and `ok` are not strings they are numeric. And you do not need to transpose your df. I clarified the answered to show more, but it works as expected – cderv Nov 02 '16 at 11:51