1

I have to solve this specific problem in R. I have a large list, containing columns and rows in this format:

    Day_and_Time    Rain1_mm/min   Rain2_mm/min 
    01.12.10 18:01     0             0
    ..............  ....           ...
    02.12.10 01:00  0.03             0             
    02.12.10 01:01  0.03             0           
    02.12.10 01:02  0.01             0           
    02.12.10 01:03  0.05             0           
    02.12.10 01:04  0.03           0.1           
    02.12.10 01:05  0.04             0  
    ..............  ....           ...
    02.12.10 18:00     0             0

What I want to do is to write a function that sums up six following rows and return the result as a new row. This means that at the end I have a new list - looking like this for example:

    Day_and_Time           Rain1_mm/5min   Rain2_mm/5min 
    ..............         ....            ... 
    02.12.10 01:05         0.19            0.1           
    02.12.10 01:10         ....            ...
    ..............         ....            ...

Is it possible to do this? The goal is to transform the unit [mm/min] from the first and second column to [mm/5min].

Thank you very much!

Frosi
  • 177
  • 5
  • 12
  • Do you have a row for every minute? – aichao Aug 02 '16 at 20:52
  • yes, from 01.12.10 18:01 to 02.12.10 18:00. So 23 hours, 59 Minutes in total! – Frosi Aug 02 '16 at 20:54
  • 1
    Definitely possible. Would be nice if you gave some data to produce solutions on using dput(). – s_baldur Aug 02 '16 at 21:06
  • Without knowing how exactly you store your data a concrete answer is not possible. You could take a look at [this](http://stackoverflow.com/questions/15265512/summing-every-n-points-in-r) question which targets a similar problem. – AEF Aug 02 '16 at 21:09
  • http://www.filehosting.org/file/details/590675/daten1_12_10.csv you can ignore the last column! – Frosi Aug 02 '16 at 21:16

2 Answers2

0

Assuming that you read the data in your .csv file as a data frame df, one approach to your problem is to use rollapply from the zoo package to give you a rolling sum:

library(zoo)

ind_keep <- seq(1,floor(nrow(df)/5)*5, by=5)                        ## 1.
out <- sapply(df[,-1], function(x) rollapply(x,6,sum))              ## 2.
out <- data.frame(df[ind_keep+5,1],out[ind_keep,])                  ## 3.
colnames(out) <- c("Day_and_time","Rain1_mm/5min","Rain2_mm/5min")  ## 4.

Notes:

  1. Here, we define the indices corresponding to every 5 minutes where we want to keep the rolling sum over the next 5 minutes.
  2. Apply a rolling sum function for each column.
    • Use sapply over all columns of df that is not the first column. Note that the column indices specified in df[,-1] can be adjusted so that you process only certain columns.
    • The function to apply is rollapply from the zoo package. The additional arguments are the width of the window 5 and the sum function so that this performs a rolling sum. At this point, out contains the rolling sums (over 5 minutes) at each minute, but we only want those every 5 minutes. Therefore,
  3. Combines the Day_and_time column from the original df with out keeping only those columns every 5 minutes. Note that we keep the last Day_and_Time in each window.
  4. This just renames the columns.

Using MikeyMike's data, which is

          Day_and_Time rain1 rain2
1  2010-02-12 01:00:00  0.03  0.00
2  2010-02-12 01:01:00  0.03  0.00
3  2010-02-12 01:02:00  0.01  0.00
4  2010-02-12 01:03:00  0.05  0.00
5  2010-02-12 01:04:00  0.03  0.10
6  2010-02-12 01:05:00  0.04  0.00
7  2010-02-12 01:06:00  0.02  0.10
8  2010-02-12 01:07:00  0.10  0.10
9  2010-02-12 01:08:00  0.30  0.00
10 2010-02-12 01:09:00  0.01  0.00
11 2010-02-12 01:10:00  0.00  0.01

this gives:

print(out)
##         Day_and_time Rain1_mm/5min Rain2_mm/5min
##1 2010-02-12 01:05:00          0.19          0.10
##2 2010-02-12 01:10:00          0.47          0.21

Note the difference in the result, this approach assumes you want overlapping windows since you specified that you want to sum the six numbers between the closed interval [i,i+5] at each 5 minute mark.


To extend the above to a window in the closed interval [i, i+nMin] at each nMin mark:

library(zoo)
nMin <- 10      ## for example 10 minutes
ind_keep <- seq(1, floor(nrow(df)/nMin)*nMin, by=nMin)
out <- sapply(df[,-1], function(x) rollapply(x, nMin+1, sum))
out <- data.frame(df[ind_keep+nMin, 1],out[ind_keep,])
colnames(out) <- c("Day_and_time",paste0("Rain1_mm/",nMin,"min"),paste0("Rain2_mm/",nMin,"min"))

For this to work, the data must have at least 2 * nMin + 1 rows

Hope this helps.

aichao
  • 7,375
  • 3
  • 16
  • 18
  • What do I have to change to get the unit mm/10min - so how to sum up 10 rows? Can I write [i,i+10]? – Frosi Aug 13 '16 at 13:53
0

Assuming you want the groups to be 0 - 5 minutes, 6 - 10 minutes, etc. this should give you what you're looking for:

library(data.table)
setDT(df)[,.(day_time = max(Day_and_Time),
             rain1_sum=sum(rain1),
             rain2_sum=sum(rain2)),
          by=.(floor(as.numeric(Day_and_Time)/360))]

     floor            day_time rain1_sum rain2_sum
1: 3516540 2010-02-12 01:05:00      0.19      0.10
2: 3516541 2010-02-12 01:10:00      0.43      0.21

Data:

df <- structure(list(Day_and_Time = structure(c(1265954400, 1265954460, 
1265954520, 1265954580, 1265954640, 1265954700, 1265954760, 1265954820, 
1265954880, 1265954940, 1265955000), class = c("POSIXct", "POSIXt"
), tzone = ""), rain1 = c(0.03, 0.03, 0.01, 0.05, 0.03, 0.04, 
0.02, 0.1, 0.3, 0.01, 0), rain2 = c(0, 0, 0, 0, 0.1, 0, 0.1, 
0.1, 0, 0, 0.01)), .Names = c("Day_and_Time", "rain1", "rain2"
), row.names = c(NA, -11L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000000240788>)
Mike H.
  • 13,960
  • 2
  • 29
  • 39