1

I found several question/answer on this topic, but I was not able to resolve my problem ... so I'll ask it my way. Sorry if it is obvious.

I prepared a dataframe (z2) to use with ggplot2. This dataframe contains a column "value" with reals, a column "name" which identifies the parameter tested, a column "loghos" which is the unique ID for each individual, and a column "statut" which is the outcome variable.

I was able to easily plot the "value" according for the "statut" for each "name" with the following code :

pt <- ggplot(z2, aes(y = value, x = statut))
pt + geom_boxplot(aes(colour=statut)) + facet_wrap(~name, scales="free_y")

In the dataframe, I have repeated values for each "name" for each "loghos" (several samples at various timepoints".

I would like to plot only the minimal value for each 'name' and for each 'patient'. So I tried to use plyr for this, and i wrote :

x = ddply(z2, .(loghos,name), function(x) return(min(x,na.rm=T)))

However, i got this error message, and now i don't know was to do ?

Erreur dans FUN(X[[1L]], ...) : 
  only defined on a data frame with all numeric variables

I'm sure it is no big deal, but I can't find how to write i correctly !

Thanks in advance,

Julien

edited : sample of the data.frame is provided below

y = z2[sample(nrow(z2),20),c(1,2,3,9,11,12,13)]
y
      cleBilan   name  value   loghos sexe age    statut
80612   328347   plaq 384.00 31218139    M  21 transfert
36304   363835     gb   5.62 41416171    M  72   service
59346   267744 lympho   9.90 30628552    F  22   service
62746   388270 lympho   8.70 41620223    M  78   service
81046   342228   plaq 185.00 41120284    M  19   service
67400   323055   mono   3.10 31273421    F  45   service
35572   335928     gb  16.16 41178061    F  22 transfert
71136   256960 neutro  10.65 30401746    M  71 transfert
34324   293368     gb  16.20 30894579    F  30   service
69010   383939   mono   6.90 41574890    M  22   service
63665   236360   mono   4.40 29970714    M  71   service
31366   233999     gb   7.20 29959612    F  18   service
60867   317932 lympho  12.00 31229099    M  50   service
74487   355581 neutro  10.68 41154330    F  23   service
65520   265864   mono   7.00 30583193    M  78   service
36553   375590     gb   7.10 41489078    M  61   service
65849   268730   mono   3.90 30652360    M  89     deces
80813   354964   plaq 404.00 41120284    M  19   service
31271   232524     gb   6.30 29934806    M  36   service
72789   291335 neutro  11.00 30923095    F  35   service
jtextori
  • 239
  • 2
  • 7
  • 3
    hi julien, are you able to provide [some sample data](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? :) – Anthony Damico Dec 20 '12 at 08:41
  • 3
    maybe `ddply(z2, .(loghos, name), summarise, value = min(value,na.rm=T))`, it's hard to tell without the data – baptiste Dec 20 '12 at 08:54
  • 1
    Or `ddply(z2, .(loghos,name), function(x) min(x$value,na.rm=T))`. The argument passed by `ddply` to the anonymous function is a data.frame. – Roland Dec 20 '12 at 09:00
  • @baptiste : Thanks ! your function works ! but I don't understand exactly the difference. ddply is to be applied on a data.frame, isn't it ? So why was it complaining that it required only numeric data ? And, as I wanted to keep the "statut" column, I ran x = ddply(z2, .(loghos, name, statut), summarise, value = min(value,na.rm=T)), which contains the statut column with strings, and it worked ?? – jtextori Dec 20 '12 at 09:08
  • 1
    `ddply` splits the data into small blocks; you need to provide it with a function that takes a block (data.frame) as input and returns a data.frame. `summarise` is one such function, but `min` itself isn't. – baptiste Dec 20 '12 at 09:24
  • 1
    someone could post one of these comments (or synthesize them) as an answer ... (@jtextori, you're allowed to post an answer to your own question ...) – Ben Bolker Dec 20 '12 at 14:09
  • @ben: did not thought I could, but with pleasure ! – jtextori Dec 21 '12 at 06:04

1 Answers1

1

Answer summary from the comments :

As commented by baptiste : " ddply splits the data into small blocks; you need to provide it with a function that takes a block (data.frame) as input and returns a data.frame. summarise is one such function, but min itself isn't".

so to reduce the initial data.frame, the correct code was :

x = ddply(z2, .(loghos, name, statut), summarise, value = min(value,na.rm=T))

In this function, only loghos, name and value are used because statut is unique for eachloghos. I added it to the list to keep its value in thex` data.frame, as it is the output variable.

jtextori
  • 239
  • 2
  • 7