0

I have hourly values for precipitation that I'd like to sum up over the hour.

My data (Nd_hourly) looks like this:

    Datum   Uhrzeit Nd
1   2013-05-01  01:00:00    0.0
2   2013-05-01  02:00:00    0.1
3   2013-05-01  03:00:00    0.0
4   2013-05-01  04:00:00    0.3

(date,time, precipitation)

and I'd like to have an output of Datum - Nd

I did the min and max temperatur with the package plyr and the function ddply with

t_maxmin=ddply(t_air,.(Datum),summarize,Datum=Datum[which.max(T_Luft)],max.value=max(T_Luft),min.value=min(T_Luft))

I then tried to do something similar for the precipitation and tried

Nd_daily=ddply(Nd_hourly,.(Datum),summarize,Datum=Datum, sum(Nd_hourly))

but get the error message

Error: only defined on a data frame with all numeric variables

I assume something may be wrong with my data input? I imported data from Excel 2010 via a .txt file.

Still very new to R and programming in general, so I would really appreciate some help :)

Anne
  • 377
  • 2
  • 4
  • 16
  • What do you mean by "sum up over the hour" and "an output of Datum - Nd"? Do you have multiple observations for each date and hour? Or do you mean to sum up hourly `Nd` observations for each day? – jbaums Apr 01 '14 at 07:55
  • We cannot reproduce the error with your toy data. Please check these links for general ideas on how to create a reproducible example, and how to do it in R: [**here**](http://stackoverflow.com/help/mcve), [**here**](http://www.sscce.org/), and [**here**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). – Henrik Apr 01 '14 at 12:37

2 Answers2

0

I think @Henrik has identified your problem, but here's an alternative approach, using data.table:

# Create some fake datetime data
datetime <- seq(ISOdate(2000,1,1), ISOdate(2000,1,10), "hours")

# A data.frame with columns for date, time, and random precipitation data. 
DF <- data.frame(date=format(datetime, "%Y-%m-%d"),
                 time=format(datetime, "%H:%M:%S"),
                 precip=runif(length(datetime)))

head(DF)

#         date     time    precip
# 1 2000-01-01 12:00:00 0.9294353
# 2 2000-01-01 13:00:00 0.5082905
# 3 2000-01-01 14:00:00 0.5222088
# 4 2000-01-01 15:00:00 0.1841305
# 5 2000-01-01 16:00:00 0.9121000
# 6 2000-01-01 17:00:00 0.2434706

library(data.table)
DT <- as.data.table(DF) # convert to a data.table
DT[, list(precip=sum(precip)), by=date]

#           date    precip
#  1: 2000-01-01  7.563350
#  2: 2000-01-02 10.147659
#  3: 2000-01-03 10.936760
#  4: 2000-01-04 13.925727
#  5: 2000-01-05 11.415149
#  6: 2000-01-06 10.966494
#  7: 2000-01-07 12.751461
#  8: 2000-01-08 15.218148
#  9: 2000-01-09 12.213046
# 10: 2000-01-10  6.219439

There's a great introductory text on data.tables here.

Given your particular data structure, the following should do the trick.

library(data.table)
DT <- data.table(Nd_hourly)
DT[, list(Nd_daily=sum(Nd)), by=Datum]
jbaums
  • 27,115
  • 5
  • 79
  • 119
  • Thank you for your help :) I installed the package and I do get a daily value. However, the values are very obviously wrong. In the first day I should have a total precipitation of 0,6. My result for that day is 30. Any idea what went wrong? – Anne Apr 01 '14 at 11:52
  • Without seeing more of your data it's hard to say. Does `sapply(DT, class)` indicate that `Nd` is numeric? Is it giving you _exactly_ 30? – jbaums Apr 01 '14 at 12:02
  • 1
    @jbaums, see also OP:s comment on my answer. Sounds like 'Nd' is a factor. `ddply` fails, while it seems that `data.table` sums an `as.numeric` version of the factors. – Henrik Apr 01 '14 at 12:12
  • Okay, I just cleared the environment, reloaded the input and did the same as before and now it works perfectly... seems like my (failed) housekeeping and all the trial and errors screwed it up... Just checked and Nd is numeric while the other two are factors. But it works great now. Thank you a lot :) – Anne Apr 01 '14 at 12:33
  • @user3483945, it is kind of you to say thank you when someone has helped you. An additional way to acknowledge people that spend their time helping you is to vote on helpful answers. Please read [**about Stackoverflow**](http://stackoverflow.com/about), [**what to do when someone answers**](http://stackoverflow.com/help/someone-answers), and [about **voting**](http://stackoverflow.com/help/why-vote). Cheers. – Henrik Apr 01 '14 at 13:23
0

Is this what you want?

library(plyr)
ddply(.data = df, .variables = .(Datum), summarize,
      sum_precip = sum(Nd))
#        Datum sum_precip
# 1 2013-05-01        0.4
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • Thank you for your help. The "Datum=Datum" was a leftover from when I did the max and min temperature... Your comment helped me a lot in understanding how the ddply function actually works. I just tried it and get the following ERROR message: "sum not meaningful for factors". I guess that means the function works but cannot be applied to my data? – Anne Apr 01 '14 at 11:57
  • I guess that, for some reason, R has interpreted 'Nd' as a character when you read the data. Default behaviour of `read.table` is then to convert character to factor. You should check the variable carefully to try to detect what R might have interpreted as a character in a variable that is supposed to be numerical. It is impossible to tell from the sample data you provided (which works just fine for me as you can see). – Henrik Apr 01 '14 at 12:06
  • Hi, the same as the suggestion above: your method worked perfectly, once I cleared the environment and started anew with loading the input data and running the script. Seems like all the trials and errors messed something up. Thank you a lot for all the help :) – Anne Apr 01 '14 at 12:36