1

I am new to Stack overflow, and also to R programming so please forgive if the question is a bit silly sounding.

What I would like to ask anyone in the know is if it is possible to display multiple summaries from just one code command.

Just to give an example of what I am trying to achieve: The data frame consists of daily climate data over a number of years (includes around 6 various variables)

sub <- subset(data, Month == "Sep" & Day==2, !is.na(data), select = MSLP:Temp)
summary(sub,mean)

      MSLP        Direction         Speed             Temp      
 Min.   : 976   Min.   :  8.4   Min.   : 1.680   Min.   : 8.18  
 1st Qu.:1007   1st Qu.:167.8   1st Qu.: 6.095   1st Qu.:13.04  
 Median :1016   Median :229.7   Median :10.010   Median :14.73  
 Mean   :1014   Mean   :213.0   Mean   :10.042   Mean   :14.68  
 3rd Qu.:1022   3rd Qu.:270.4   3rd Qu.:13.320   3rd Qu.:16.40  
 Max.   :1034   Max.   :353.6   Max.   :25.640   Max.   :21.58 

All good so far. But what I would like to ask, if anybody would know, is if it would be possible to include something in the above code to display a summary for each day over a set period? Say from day 2 through to day 10.

Also if OK, would it be possible to include another critera in the above code to include a specific year? IE:

sub <- subset(data, Month == "Sep" & Day==2 - include year etc.

as I just cannot figure it out at all. For eg, if I do

sub <- subset(data, Month == "Sep" & Day==2 & Year == 1967 #etc ...)

I just get an error code like this:

Error in eval(expr, envir, enclos)

Apologies again if these questions seem a little idiotic but if anybody has any solutions to the above I would be very grateful.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Paddy J
  • 13
  • 3
  • Please include sample input data and be clear on what you want the desired output to be. You should always try to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) when asking questions. But it sounds like the `aggregate` function might be a better choice. Or pe – MrFlick Sep 04 '14 at 03:00

2 Answers2

0

you certainly can! If you're printing the results to the console with the summary function, I'm assuming there's not going to be too many days (<100 or so) you want summaries for. In which case a simple for loop, might be all you need.

Heres an example with the mtcars dataset (comes with R) data(mtcars) summary(mtcars)

for(i in unique(mtcars$cyl)) {
  print(paste("summary for dataset when cyl ==", i))
  print(summary(mtcars[mtcars$cyl==i,]))
}

In your case, I think you could use this to print a summary for days 2-10:

for(i in 2:10) {
  print(paste("summary for day ==", i))
  sub <- subset(data, Month == "Sep" & Day==i, !is.na(data), select = MSLP:Temp)
  print(summary(sub)
}

You can definitely include a third condition such as Year==1967 in your subset function. Perhaps Year is not defined? The Error in eval(expr, envir, enclos) usually arises when an object is not defined. try class(data$Year) to make sure data$Year is of type numeric or integer.

ajb
  • 692
  • 6
  • 16
  • ajb, thank you so much for your response and helpful reply! I tested the mtcars example and works a treat! as does the function for my own dataset which you very kindly put together! But only one problem, which is without doubt something very wrong I am doing, All the results in summary lists come up as 'NA' or 'NAN'. I have tried to mess around the the 'na.rm = TRUE' command, but no matter where I place it in the function, it will not work. EG: sub <- subset(data, Month == "Sep" & Date==i, na.rm=T, select = Sorry for being a pain, but if you could please kindly advise. & ty for xtra tips – Paddy J Sep 04 '14 at 04:24
  • can you copy the output of what R spits out when you run `str(data)` – ajb Sep 04 '14 at 05:02
  • Upon further inspection... if you want to just select the columns named "MSLP" and "Temp", you want: `subset(data, Month == "Sep" & Day==2, select = c("MSLP", "Temp"))` If you want to remove NAs from this dataframe, see this [post.](http://stackoverflow.com/questions/20342435/efficient-method-to-subset-drop-rows-with-na-values-in-r) There might be a cleaner way, but 2 steps will work: `df1 <- subset(data, Month == "Sep" & Day==2, select = c("MSLP", "Temp"))` `df2 <- [rowSums(is.na(df1))==0, ]` – ajb Sep 04 '14 at 05:16
  • Hi ajb, thanks again for response! str(data) 'data.frame': 2168 obs. of 10 variables: $ Year : int 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ... $ Month : num 1 1 1 1 1 1 1 1 1 1 ... $ Date : int 1 2 3 4 5 6 7 8 9 10 ... $ Day : int 5 6 7 1 2 3 4 5 6 7 ... $ Full.Date: Factor w/ 2066 levels "","2009-01-01",..: 2 3 4 5 6 7 8 9 10 11 ... $ Max : num 3.3 3.2 2.4 6.6 4.9 1.5 2.4 3.7 6.6 9.1 ... $ Min : num 1.7 -0.7 -3 1.2 -3 -5.8 -5.7 -1.4 0.6 2.8 ... $ Rain : num 0 0 0 0.8 0 0 0 0 0 12.8 ... $ Wind : num 17.4 13.1 7.8 7.5 11.1 6 6.6 – Paddy J Sep 04 '14 at 05:39
  • Just to add, I am using a different, smaller dataset to practice on, with slightly different headers. And as you can see, its a bit of a mess. Please accept my apologies for all these questions. I attempted the '2 steps', but getting error 'Error: unexpected '[' in "df2 <- (["' R is a fantastic program, but very very frustrating at times. – Paddy J Sep 04 '14 at 05:43
0

I eventually got it to work (minus the NAs/NANS) by tweeking ajb's very helpful function a little.

The following function displays individual summaries for selected variables in each of the 31 days in August:

for(i in 1:31) {
print(paste("summary for date ==", i))
x <- subset(df, Month == 8 & Date==i, select = V1:V4)
print(summary(x,is.na=T[x$Date==i,]))
}

Still produces error messages so obviously it needs to be tweeked more but main thing is the results are still produced.

Edit, found the problem. A useless 'is.na' function placed within the 'x' function produced error messages despite producing desired result. Above function has been edited and produces a cleaner result.

Paddy J
  • 13
  • 3