0

I have 5 columns in my data table id, created_time, amount, balance, and created_month. I want the last row for each id and created_month , I am trying to get all 5 columns grouped by created_month and id.

Input datatable test:

id  created_time amount balance  created_month
 1 1/15/14 10:17      2       1         1/1/14
 1 1/15/14 11:17      2       1         1/1/14
 1 1/15/14 20:17      2       1         1/1/14
 2 1/15/14 11:17      2       1         1/1/14
 2 1/16/14 12:17      2       1         1/1/14
 2 2/16/14 23:17      2       1         2/1/14

I have sorted by id and created_time as

setkeyv(test, c("id","created_time"))

I need to

  1. Convert the created_month to show the first day of the month . Similar to date_trunc('month',created_month) in Sql .
  2. Sort the values by created_time column and get all the columns grouped by 'id' and 'created_month'

The below one only gives me balance since I have only one field in tail option

test[ ,  tail(balance,1L) , by=c("balanceable_id","created_month" )]

I am not sure how to add multiple fields to tail to display all columns in original table.

My goal is to get this data table:

id created_month        created_time amount balance
 1    2014-01-01 2014-01-15 20:17:00      2       1
 2    2014-01-01 2014-01-16 12:17:00      2       1
 2    2014-02-01 2014-02-16 23:17:00      2       1
Prem
  • 11,775
  • 1
  • 19
  • 33
zoomi
  • 45
  • 5
  • Welcome to StackOverflow! For a first-time question in the r tag, this was a nice attempt. However, for code debugging please always ask with [reproducible](https://stackoverflow.com/q/5963269/1422451) code/data per the [MCVE](https://stackoverflow.com/help/mcve) and [`r`](https://stackoverflow.com/tags/r/info) tag description, with the desired output. You may wish to use `dput()` for sharing per the tag description. It is an easy way to reproduce R data. You can put the messy `dput(data_frame_name)` or `dput(head(data_frame_name))` output at the bottom of your question (click edit). – Hack-R Jul 04 '18 at 17:52
  • Also, what's this `Datatable --> test`? That doesn't look like normal R syntax. Is this syntax from some library? – Hack-R Jul 04 '18 at 17:58

1 Answers1

1

One of the approach could be

library(data.table)
library(lubridate)

setDT(df)[, created_time := as.POSIXct(created_time, "%m/%d/%y %H:%M", tz = "GMT")  #convert to timestamp format
          ][, created_month := floor_date(created_time, "month")    #add a column having 1st day of created_time's month
            ][order(id, created_month)    
              ][, .SD[.N], .(id, created_month)]                    #fetch last records

which gives

   id created_month        created_time amount balance
1:  1    2014-01-01 2014-01-15 20:17:00      2       1
2:  2    2014-01-01 2014-01-16 12:17:00      2       1
3:  2    2014-02-01 2014-02-16 23:17:00      2       1


Sample data:

df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L), created_time = c("1/15/14 10:17", 
"1/15/14 11:17", "1/15/14 20:17", "1/15/14 11:17", "1/16/14 12:17", 
"2/16/14 23:17"), amount = c(2L, 2L, 2L, 2L, 2L, 2L), balance = c(1L, 
1L, 1L, 1L, 1L, 1L)), .Names = c("id", "created_time", "amount", 
"balance"), class = "data.frame", row.names = c(NA, -6L))

#  id  created_time amount balance
#1  1 1/15/14 10:17      2       1
#2  1 1/15/14 11:17      2       1
#3  1 1/15/14 20:17      2       1
#4  2 1/15/14 11:17      2       1
#5  2 1/16/14 12:17      2       1
#6  2 2/16/14 23:17      2       1
Prem
  • 11,775
  • 1
  • 19
  • 33