R - ddply summarise using nlevels() does not work

Question

When using the plyr package to summarise my data, it seems impossible to use the nlevels() function.

The structure of my data set is as follows:

>aer <- read.xlsx("XXXX.xlsx", sheetIndex=1)
>aer$ID <- as.factor(aer$ID)
>aer$description <- as.factor(aer$description)    
>head(aer)

  ID SOC   start        end         days  count severity relation
1  1 410   2015-04-21   2015-04-28    7     1        1        3
2  1 500   2015-01-30   2015-05-04   94     1        1        3
3  1 600   2014-11-25   2014-11-29    4     1        1        3
4  1 600   2015-01-02   2015-01-07    5     1        1        3
5  1 600   2015-01-26   2015-03-02   35     1        1        3
6  1 600   2015-04-14   2015-04-17    3     1        1        3

> dput(head(aer,4))
structure(list(ID = structure(c(1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "12", "13", "14", 
"15"), class = "factor"), SOC = c(410, 500, 600, 600),  
start = structure(c(16546, 16465, 16399, 16437), class = "Date"), 
end = structure(c(16553, 16559, 16403, 16442), class = "Date"), 
days = c(7, 94, 4, 5), count = c(1, 1, 1, 1), severity = c(1, 
1, 1, 1), relation = c(3, 3, 3, 3)), .Names = c("ID", "SOC", 
"description", "start", "end", "days", "count", "severity", "relation"
), row.names = c(NA, 4L), class = "data.frame")

What I would like to know is how many levels exists in the "ID" variable in data sections created, when dividing the data set using the variable "SOC". I want to summarise this information together with some other variables in a new data set. Therefore, I would like to use the plyr package like so:

summaer2 <- ddply(aer, c("SOC"), summarise,
    participants    = nlevels(ID), 
    events          = sum(count),
    min_duration    = min(days), 
    max_duration    = max(days),
    max_severity    = max(severity))

This returns the following error:

Error in Summary.factor(c(4L, 5L, 11L, 11L, 14L, 14L), na.rm = FALSE) : 
‘max’ not meaningful for factors

Could someone give me advice on how to reach my goal? Or what I'm doing wrong?

Many thanks in advance!

Are you sure `nlevels()` is the problem? Seems like it's complaining about `max()`, are you sure `days` and `severity` are numeric? You should share your input data in a [reproducible format](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) (ie a `dput()`) so we can see how you inpoted your data. — MrFlick, Jun 22 '15 at 14:58
@MrFlick I've substituted nlevels() with length(), then it works fine. (But I don't get my levels, just the length of the data sections.... :) ) — RmyjuloR, Jun 22 '15 at 15:01
@Veerendra Gadekar `max(levels(ID)` gives my the same number for every data section, which is not correct. Does not give me an error though. — RmyjuloR, Jun 22 '15 at 15:04
I think I found the solution: subtituting `nlevels(ID)` with `length(unique(ID))` gives me the number of levels per section... — RmyjuloR, Jun 22 '15 at 15:10
@Veerendra Gadekar `max(as.vector(severity))` also gives me incorrect values — RmyjuloR, Jun 22 '15 at 15:20
@RmyjuloR please mention how your desired output should look like — Veerendra Gadekar, Jun 22 '15 at 15:22
@RmyjuloR so may be you can answer the question yourself to mark the question as answered — Veerendra Gadekar, Jun 22 '15 at 15:40

score 0 · Answer 1 · answered Jun 22 '15 at 15:55

Update:

Substituting nlevels(ID) with length(unique(ID)) seems to give me the desired output:

> head(summaer2)
   SOC participants events min_duration max_duration max_severity
1  100            4      7            1           62            2
2  410            9     16            1           41            2
3  431            2      2          109          132            1
4  500            5      9           23          125            2
5  600            8     19            1           35            1
6 1040            1      1           98           98            2

R - ddply summarise using nlevels() does not work

1 Answers1