R - sum #occurrences of a string for each unique person

Question

I want to transform this

     Person    Error_Type
[1,]  Name_1      Type_A
[2,]  Name_2      Type_B
[3,]  Name_1      Type_A
[4,]  Name_3      Type_C
[5,]  Name_2      Type_C
[6,]  Name_1      Type_B

Into this:

      Person     Type_A     Type_B    Type_C
[1,]  Name_1       2          1          -     
[2,]  Name_2       -          1          1   
[3,]  Name_3       -          0          1

Both Names_ and Type_ are strings

Thanks!

First thanks for your reply @Zach , I'm really just starting with R. I know that I can use unique(db$Person) to get all unique values in the person's column, I'm not sure how to sum unique(db$Error_Type) occurrences by unique(db$Person) — EduGord, Nov 02 '16 at 01:42
Do note: your post shows a matrix and not a data frame. You cannot have different data types in a matrix, only one type across all elements. Prior to aggregation, convert to dataframe (a list of equal length atomic vectors) which can have heterogeneous types. — Parfait, Nov 02 '16 at 02:26

Joseph Wood · Accepted Answer · 2016-11-02T04:04:05.860

As @thelatemail pointed out for the example given, table gives almost exactly what the OP requested in one function call.

df <- data.frame(Person = c("Name_1","Name_2","Name_1","Name_3","Name_2","Name_1"),
Error_Type = c("Type_A","Type_B","Type_A","Type_C","Type_C","Type_B"),
stringsAsFactors = FALSE)

table(df)
        Error_Type
Person   Type_A Type_B Type_C
  Name_1      2      1      0
  Name_2      0      1      1
  Name_3      0      0      1

However, the OP has stated that the actual data is a bit more complex than the given example. Below is a base R solution that should work on a more general level.

MakeDf <- function(myDf) {
    myCols <- unique(myDf$Error_Type)
    z <- split(myDf, myDf$Person)
    lenR <- length(z)
    newDf <- data.frame(matrix(rep(0, lenR*length(myCols)), nrow = lenR))
    colnames(newDf) <- myCols; rownames(newDf) <- names(z)
    for (i in 1:lenR) {
        t <- rle(z[[i]]$Error_Type)
        newDf[i, t$values] <- t$lengths
    }
    newDf
}

MakeDf(df)
       Type_A Type_B Type_C
Name_1      2      1      0
Name_2      0      1      1
Name_3      0      0      1

This function takes advantage of split and rle (very similar to table).

score 0 · Answer 2 · answered Nov 02 '16 at 01:52

0

Try this... (myMat is the matrix described above)

myMat <- data.frame(myMat)
aggregate(myMat, by = list(myMat$Error_Type), FUN = length)

The function will aggregate the first argument by the function specified, by the things listed in "by" argument.

answered Nov 02 '16 at 01:52

honeyBunchesOfFloats

26
5

2

`aggregate(dat["Error_Type"], dat, FUN = length)` would be simpler if you want this format. Excluding the obvious `table(dat)` to get a result similar to what OP requested. – thelatemail Nov 02 '16 at 02:40
@thelatemail Entirely accurate; I'm merely a creature of habit, and this was the first way I learned to aggregate data. I'm reading up on `table()` as we speak, as it looks like it could save me some keystrokes going forward. – honeyBunchesOfFloats Nov 02 '16 at 02:48
@honeyBunchesOfFloats @thelatemail On my actual database I have other columns, this is almost working but not quite. The closest I could get from what I want was this code: `aggregate(db$Person, by = list(db$Person,db$Error_Type), FUN = length)` Which gives me a table with header 'Group.1', 'Group.2' and 'x' Containing: 'Group.1' : Name the person 'Group.2': Type of error 'x': Total number of that occurrences. What I want as headers would be 'Person', 'Error Type A', 'Error Type B',... and only row for each person – EduGord Nov 02 '16 at 02:54
Perhaps try looking at the melt function in the reshape package. – honeyBunchesOfFloats Nov 02 '16 at 03:14

R - sum #occurrences of a string for each unique person

2 Answers2