1

I have a data frame composed of 10 variables. One variable is the station ID. I have 10 stations (A, B, C, ,and J) with daily observation for five years for each station. I have 18260 observations in total. My observations contain NA for many days. My dataframe looks like this

stationID      x1      x2     x3      x4       x10
A
A
A
A
B
B
B
B
C
C
C
C

J
J
J

I want to get the summary statistics (mainly n, mean, median and sd) for each variable based on the station name.

I though about

library(psych) describeBy(mydf, mydf$stationID)

I got the summary statistics for all variables as table for each station. But I want to get something like this

                   station A        station B        station C    station J
var.2         n
           mean
             sd
         median
Var.3         n
           mean
             sd
         median

Var.10        n
           mean
             sd
         median

I didn't get the summary statistics when the variable has NA. How can I get the output of summary statistics as shown for all variables including those who have NA?

I believe this question is different than other suggested questions/answers because it has more than one statistical parameter and my observations contain NA

user1237585
  • 63
  • 1
  • 6
  • Somethng along the line of: `do.call("cbind", tapply( mydf, mydf$stationID, function(x) (list( n=sum(!is.na(x), mean=mean(x,na.rm=TRUE), sd=sd(x,na.rm=TRUE), median(x, na.rm=TRUE) )} ) )` – IRTFM Nov 03 '15 at 07:52
  • @BondedDust Thanks for your help. I got this error --> Error :unexpected '}' in "do.call – user1237585 Nov 03 '15 at 08:01
  • I removed } and now is giving me + after running the above line – user1237585 Nov 03 '15 at 08:02
  • I didn't think the cited question/answers necessarily led to an obvious solution, but am not voting to reopen because there was no MWE and I really deplore problem descriptions that have triple dots or etc's in them. The code I offered was not tested .... since there was nothing to test it on. – IRTFM Nov 03 '15 at 16:15
  • @BondedDust Thanks for your help. You can find the data in this link https://goo.gl/Y4Q9JA Given that it I'm new to R and stackoverflow, I won't add triple dots or etc. in the next questions. I did test your code on my dataset. I'd be grateful if you could test it on the data. Thanks again – user1237585 Nov 03 '15 at 23:03
  • Sorry. I really do not see why a 2 MB file is needed to do testing. – IRTFM Nov 04 '15 at 05:46
  • @BondedDust Thanks. Another point to consider in the future. Please, find another smaller file to do testing in this link https://goo.gl/Y4Q9JA – user1237585 Nov 04 '15 at 21:23
  • @42- Did you manage to test your code on the data https://www.dropbox.com/s/5xm978yjcozxb00/mydf.csv?dl=0 ? – user1237585 Nov 09 '15 at 21:56

0 Answers0