Is there any way that i can perform sum instead of count using cut or any other function using R

Question

By referring the below link am able to find frequency count of columns using cut.

Am able to get output as using above link as

var1                 freq
2015-10-01 10:00:00  10

But i require sum of those record column values in that hour. Here for example i have a column value_column with different time intervals in a day. How to do sum of those values and display it in separate columns.

value_column  date_time
14            10/1/2015 10:00
10            10/1/2015 10:02
16            10/1/2015 10:03
9             10/1/2015 10:04
1             10/1/2015 10:05
5             10/1/2015 10:06
13            10/1/2015 10:07
21            10/1/2015 10:08
18            10/1/2015 10:09
16            10/1/2015 10:10

Expected output

sum_value_column  date_time
123               10/1/2015 10:00

Thanks in advance.

akrun · Answer 1 · 2015-12-02T07:31:06.677

We can convert the 'date_time' column to POSIXct class, replace the minute part with 00 using format, group by that variable and get the sum of 'value_column' with summarise.

 library(dplyr)
 df1 %>%
     group_by(date_time = format(as.POSIXct(date_time, 
                           format='%m/%d/%Y %H:%M'), '%m/%d/%Y %H:00')) %>% 
     summarise(sum_value_column = sum(value_column))
#            date_time sum_value_column
#            (chr)            (int)
#1 10/01/2015 10:00              123

data

df1 <- structure(list(value_column = c(14L, 10L, 16L, 9L, 1L, 
5L, 13L, 
21L, 18L, 16L), date_time = c("10/1/2015 10:00", "10/1/2015 10:02", 
"10/1/2015 10:03", "10/1/2015 10:04", "10/1/2015 10:05",
"10/1/2015 10:06", 
"10/1/2015 10:07", "10/1/2015 10:08", "10/1/2015 10:09",
"10/1/2015 10:10")), .Names = c("value_column", "date_time"), 
 class = "data.frame", row.names = c(NA, -10L))

score 3 · Answer 2 · edited Dec 02 '15 at 13:42

For SQL users, assuming the input is the data frame data:

library(sqldf)

sqldf("select substr(date_time, 1, instr(date_time, ':')) || '00' date_time, 
              sum(value_column)
       from data
       group by substr(date_time, 1, instr(date_time, ':')) || '00'")

Alternately, we could factor out the complex expression into a nested select statement like this:

sqldf("select date_time, 
              sum(value_column)
       from (select substr(date_time, 1, instr(date_time, ':')) || '00' date_time,
                    value_column
             from data)
       group by date_time")

score 1 · Accepted Answer · answered Dec 02 '15 at 12:01

I would probably try with :

df1$date_time <- as.character(df1$date_time, stirngAsFactors = F) 

df1$date <- str_split_fixed(df1$date_time, " ")[,1] 

df1$date <- as.Date(df1$date, "%d/%m/%Y") 

df1$time <- str_split_fixed(df1$date_time, " ")[,2]

total_table <- aggregate(df1$value_column, by = list(df1$date, df1$time), FUN =sum)

Probably this is a bit big but I can use both date and time for any further analysis.

Is there any way that i can perform sum instead of count using cut or any other function using R

3 Answers3

data