0

I have a data frame in R that I uploaded from a csv in R and am trying to find the maximum temperature for each day. The data.frame is formatted such that col(1) is Date (YYYY-MM-DD HH:mm format) and col(2) is the temperature at that Date/Time. I tried sorting the data into subsets, working top down (Years, months in that year, days in those months), but found it to be very complicated.

Here is a sample of the data frame:

                 Date Unit Temp
1 2012-10-21 21:14:00    C 82.5
2 2012-10-21 21:34:00    C 37.5
3 2012-10-21 21:54:00    C 20.0
4 2012-10-21 22:14:00    C 26.5
5 2012-10-21 22:34:00    C 20.0
6 2012-10-21 22:54:00    C 19.0
user2498712
  • 35
  • 3
  • 8
  • Use `dput` or `head` to post some of your data frame for specific answers. See: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – harkmug Jun 18 '13 at 20:59

4 Answers4

1

The function apply.daily in the package xts does exactly what you want.

install.packages("xts")
require('xts')

tmp <- data.frame(Date = seq(as.POSIXct("2013-06-18 10:00"),
    length.out = 100, by = "6 hours"),
    Unit = "C",
    Temp = rnorm(n = 100, mean = 20, sd = 5)) # thanks to dickoa for this code

head(tmp)
data <- xts(x=tmp[ ,3], order.by=tmp[,1])
attr(data, 'Unit') <- tmp[,'Unit']
attr(data, 'Unit')

dMax <- apply.daily(data, max)
head(dMax)
sfuj
  • 231
  • 7
  • 11
0

I would create a column that was day of year (DoY), then use the aggregate function to find the maximum temperature for each DoY.

E.g., say that you data.frame is called Data, and Data has two columns: the first is named "Date", and the second is named "Temperature". I would do the following:

Data[,"DoY"] <- format.Date(Data[,"Date"], format="%j") #make sure that Data[,"Date"] is already in a recognizable format-- e.g., see as.POSIXct()
MaxTemps <- aggregate(Data[,"Temperature"], by=list(Data[,"DoY"]), FUN=max) # can add na.rm=TRUE if there are missing values

MaxTemps should contain the maximum temperatures observed on each day. If, however, there are multiple years in your data set such that, e.g., day 169 (today) repeats more than once (e.g., today, and 1 year ago), you could do the following:

Data[,"DoY"] <- format.Date(Data[,"Date"], format="%Y_%j") #notice the date format, which will be unique for all combinations of year and day of year.
MaxTemps <- aggregate(Data[,"Temperature"], by=list(Data[,"DoY"]), FUN=max) # can add na.rm=TRUE if there are missing values

I hope that this helps!

rbatt
  • 4,677
  • 4
  • 23
  • 41
0

Without a reproductible example is not an easy task.

That being said, you can use lubridate (date management) and plyr (split-apply) to solve this problem.

Let's create a data similar to yours first

set.seed(123)
tmp <- data.frame(Date = seq(as.POSIXct("2013-06-18 10:00"),
                  length.out = 100, by = "6 hours"),
                  Unit = "C",
                  Temp = rnorm(n = 100, mean = 20, sd = 5))
str(tmp)
## 'data.frame':    100 obs. of  3 variables:
##  $ Date: POSIXct, format: "2013-06-18 10:00:00" ...
##  $ Unit: Factor w/ 1 level "C": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Temp: num  17.2 18.8 27.8 20.4 20.6 ...


write.csv(tmp, "/tmp/tmp.csv", row.names = FALSE)
rm(tmp)

Now we can compute the maximum

require(lubridate)
require(plyr)

### NULL is to not import the second column which is the unit 
tmp <- read.csv("/tmp/tmp.csv",
                colClasses = c("POSIXct", "NULL", "numeric"))


tmp <- transform(tmp, jday = yday(Date))


ddply(tmp, .(jday), summarise, max_temp = max(Temp))

##    jday max_temp
## 1   169   27.794
## 2   170   28.575
## 3   171   26.120
## 4   172   22.004
## 5   173   28.935
## 6   174   18.910
## 7   175   24.189
## 8   176   26.269
## 9   177   24.476
## 10  178   23.443
## 11  179   18.960
## 12  180   30.845
## 13  181   23.900
## 14  182   26.843
## 15  183   27.582
## 16  184   21.898
...................
dickoa
  • 18,217
  • 3
  • 36
  • 50
  • sorry, i should add some sample from my data frame, let me do this now. I am new to R and stackflow! – user2498712 Jun 18 '13 at 21:17
  • @user2498712 I update my answer according to your data structure. Try it to see if it works – dickoa Jun 18 '13 at 21:23
  • I get this error message: > ddply(tmp, .(jday), summarise, max_temp = max(Temp)) Error in attributes(out) <- attributes(col) : 'names' attribute [9] must be the same length as the vector [4] – user2498712 Jun 18 '13 at 21:29
  • @user2498712 I created a full reproductible example close to yours. It should work normally – dickoa Jun 18 '13 at 21:45
  • ah! unfrotunately, it didn't work. It is now telling me: Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format – user2498712 Jun 18 '13 at 22:00
  • @user2498712 With your sample data it works for me. May be there are some dates which are not in the right format. Without the full data set it will be difficult to tell. Sorry – dickoa Jun 18 '13 at 22:20
  • I was able to get it to work! However, if I want to conver the jday back into yday, what origin was used for the original conversion? Thank you so much for your help @dickoa ! – user2498712 Jun 19 '13 at 02:35
0

I will assume you have a data frame called df with variables date and temp. This code is untested, but it may work, with a little luck.

library(lubridate)
df$justday <- floor_date(df$date, "day")

# for just the maxima, you could use this:
tapply(df$temp, df$justday, max)

# if you would rather have the results in a data frame, use this:
aggregate(temp ~ justday, data=df)
Jean V. Adams
  • 4,634
  • 2
  • 29
  • 46